xtreme_pawsx
Stay organized with collections
Save and categorize content based on your preferences.
This dataset contains machine translations of the English PAWS training data.
The translations are provided by the XTREME benchmark and cover the following
languages:
- French
- Spanish
- German
- Chinese
- Japanese
- Korean
For further details on PAWS, see the papers: PAWS: Paraphrase Adversaries from
Word Scrambling at https://arxiv.org/abs/1904.01130 and PAWS-X: A Cross-lingual
Adversarial Dataset for Paraphrase Identification at
https://arxiv.org/abs/1908.11828
For details related to XTREME, please refer to: XTREME: A Massively Multilingual
Multi-task Benchmark for Evaluating Cross-lingual Generalization at
https://arxiv.org/abs/2003.11080
FeaturesDict({
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'sentence1': Text(shape=(), dtype=string),
'sentence2': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
label |
ClassLabel |
|
int64 |
|
sentence1 |
Text |
|
string |
|
sentence2 |
Text |
|
string |
|
@article{hu2020xtreme,
author = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},
title = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},
journal = {CoRR},
volume = {abs/2003.11080},
year = {2020},
archivePrefix = {arXiv},
eprint = {2003.11080}
}
xtreme_pawsx/de (default config)
Split |
Examples |
'train' |
49,340 |
xtreme_pawsx/es
Split |
Examples |
'train' |
49,244 |
xtreme_pawsx/fr
Split |
Examples |
'train' |
49,208 |
xtreme_pawsx/ja
Split |
Examples |
'train' |
49,086 |
xtreme_pawsx/ko
Split |
Examples |
'train' |
49,298 |
xtreme_pawsx/zh
Split |
Examples |
'train' |
49,149 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# xtreme_pawsx\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThis dataset contains machine translations of the English PAWS training data.\nThe translations are provided by the XTREME benchmark and cover the following\nlanguages:\n\n- French\n- Spanish\n- German\n- Chinese\n- Japanese\n- Korean\n\nFor further details on PAWS, see the papers: PAWS: Paraphrase Adversaries from\nWord Scrambling at \u003chttps://arxiv.org/abs/1904.01130\u003e and PAWS-X: A Cross-lingual\nAdversarial Dataset for Paraphrase Identification at\n\u003chttps://arxiv.org/abs/1908.11828\u003e\n\nFor details related to XTREME, please refer to: XTREME: A Massively Multilingual\nMulti-task Benchmark for Evaluating Cross-lingual Generalization at\n\u003chttps://arxiv.org/abs/2003.11080\u003e\n\n- **Homepage** :\n \u003chttps://github.com/google-research/xtreme\u003e\n\n- **Source code** :\n [`tfds.text.xtreme_pawsx.XtremePawsx`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/xtreme_pawsx/xtreme_pawsx.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): No release notes.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Feature structure**:\n\n FeaturesDict({\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'sentence1': Text(shape=(), dtype=string),\n 'sentence2': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-----------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| label | ClassLabel | | int64 | |\n| sentence1 | Text | | string | |\n| sentence2 | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{hu2020xtreme,\n author = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},\n title = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},\n journal = {CoRR},\n volume = {abs/2003.11080},\n year = {2020},\n archivePrefix = {arXiv},\n eprint = {2003.11080}\n }\n\nxtreme_pawsx/de (default config)\n--------------------------------\n\n- **Config description**: Translated to de\n\n- **Download size** : `22.34 MiB`\n\n- **Dataset size** : `14.19 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 49,340 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nxtreme_pawsx/es\n---------------\n\n- **Config description**: Translated to es\n\n- **Download size** : `22.27 MiB`\n\n- **Dataset size** : `14.09 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 49,244 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nxtreme_pawsx/fr\n---------------\n\n- **Config description**: Translated to fr\n\n- **Download size** : `22.70 MiB`\n\n- **Dataset size** : `14.53 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 49,208 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nxtreme_pawsx/ja\n---------------\n\n- **Config description**: Translated to ja\n\n- **Download size** : `25.12 MiB`\n\n- **Dataset size** : `16.98 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 49,086 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nxtreme_pawsx/ko\n---------------\n\n- **Config description**: Translated to ko\n\n- **Download size** : `22.99 MiB`\n\n- **Dataset size** : `14.86 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 49,298 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nxtreme_pawsx/zh\n---------------\n\n- **Config description**: Translated to zh\n\n- **Download size** : `21.45 MiB`\n\n- **Dataset size** : `13.21 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 49,149 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]