xtreme_xnli
Stay organized with collections
Save and categorize content based on your preferences.
This dataset contains machine translations of MNLI into each of the XNLI
languages. The translation data is provided by XTREME. Note that this is
different from the machine translated data provided by the original XNLI paper.
Split |
Examples |
'train' |
392,570 |
FeaturesDict({
'hypothesis': TranslationVariableLanguages({
'language': Text(shape=(), dtype=string),
'translation': Text(shape=(), dtype=string),
}),
'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
'premise': Translation({
'ar': Text(shape=(), dtype=string),
'bg': Text(shape=(), dtype=string),
'de': Text(shape=(), dtype=string),
'el': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
'es': Text(shape=(), dtype=string),
'fr': Text(shape=(), dtype=string),
'hi': Text(shape=(), dtype=string),
'ru': Text(shape=(), dtype=string),
'sw': Text(shape=(), dtype=string),
'th': Text(shape=(), dtype=string),
'tr': Text(shape=(), dtype=string),
'ur': Text(shape=(), dtype=string),
'vi': Text(shape=(), dtype=string),
'zh': Text(shape=(), dtype=string),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
hypothesis |
TranslationVariableLanguages |
|
|
|
hypothesis/language |
Text |
|
string |
|
hypothesis/translation |
Text |
|
string |
|
label |
ClassLabel |
|
int64 |
|
premise |
Translation |
|
|
|
premise/ar |
Text |
|
string |
|
premise/bg |
Text |
|
string |
|
premise/de |
Text |
|
string |
|
premise/el |
Text |
|
string |
|
premise/en |
Text |
|
string |
|
premise/es |
Text |
|
string |
|
premise/fr |
Text |
|
string |
|
premise/hi |
Text |
|
string |
|
premise/ru |
Text |
|
string |
|
premise/sw |
Text |
|
string |
|
premise/th |
Text |
|
string |
|
premise/tr |
Text |
|
string |
|
premise/ur |
Text |
|
string |
|
premise/vi |
Text |
|
string |
|
premise/zh |
Text |
|
string |
|
@article{hu2020xtreme,
author = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},
title = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},
journal = {CoRR},
volume = {abs/2003.11080},
year = {2020},
archivePrefix = {arXiv},
eprint = {2003.11080}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# xtreme_xnli\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThis dataset contains machine translations of MNLI into each of the XNLI\nlanguages. The translation data is provided by XTREME. Note that this is\ndifferent from the machine translated data provided by the original XNLI paper.\n\n- **Homepage** :\n \u003chttps://www.nyu.edu/projects/bowman/xnli/\u003e\n\n- **Source code** :\n [`tfds.text.xtreme_xnli.XtremeXnli`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/xtreme_xnli/xtreme_xnli.py)\n\n- **Versions**:\n\n - **`1.1.0`** (default): No release notes.\n- **Download size** : `2.31 GiB`\n\n- **Dataset size** : `1.59 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 392,570 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'hypothesis': TranslationVariableLanguages({\n 'language': Text(shape=(), dtype=string),\n 'translation': Text(shape=(), dtype=string),\n }),\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'premise': Translation({\n 'ar': Text(shape=(), dtype=string),\n 'bg': Text(shape=(), dtype=string),\n 'de': Text(shape=(), dtype=string),\n 'el': Text(shape=(), dtype=string),\n 'en': Text(shape=(), dtype=string),\n 'es': Text(shape=(), dtype=string),\n 'fr': Text(shape=(), dtype=string),\n 'hi': Text(shape=(), dtype=string),\n 'ru': Text(shape=(), dtype=string),\n 'sw': Text(shape=(), dtype=string),\n 'th': Text(shape=(), dtype=string),\n 'tr': Text(shape=(), dtype=string),\n 'ur': Text(shape=(), dtype=string),\n 'vi': Text(shape=(), dtype=string),\n 'zh': Text(shape=(), dtype=string),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------------------|------------------------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| hypothesis | TranslationVariableLanguages | | | |\n| hypothesis/language | Text | | string | |\n| hypothesis/translation | Text | | string | |\n| label | ClassLabel | | int64 | |\n| premise | Translation | | | |\n| premise/ar | Text | | string | |\n| premise/bg | Text | | string | |\n| premise/de | Text | | string | |\n| premise/el | Text | | string | |\n| premise/en | Text | | string | |\n| premise/es | Text | | string | |\n| premise/fr | Text | | string | |\n| premise/hi | Text | | string | |\n| premise/ru | Text | | string | |\n| premise/sw | Text | | string | |\n| premise/th | Text | | string | |\n| premise/tr | Text | | string | |\n| premise/ur | Text | | string | |\n| premise/vi | Text | | string | |\n| premise/zh | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{hu2020xtreme,\n author = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},\n title = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},\n journal = {CoRR},\n volume = {abs/2003.11080},\n year = {2020},\n archivePrefix = {arXiv},\n eprint = {2003.11080}\n }"]]