race
Stay organized with collections
Save and categorize content based on your preferences.
Race is a large-scale reading comprehension dataset with more than 28,000
passages and nearly 100,000 questions. The dataset is collected from English
examinations in China, which are designed for middle school and high school
students. The dataset can be served as the training and test sets for machine
comprehension.
FeaturesDict({
'answers': Sequence(Text(shape=(), dtype=string)),
'article': Text(shape=(), dtype=string),
'example_id': Text(shape=(), dtype=string),
'options': Sequence(Sequence(Text(shape=(), dtype=string))),
'questions': Sequence(Text(shape=(), dtype=string)),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
answers |
Sequence(Text) |
(None,) |
string |
|
article |
Text |
|
string |
|
example_id |
Text |
|
string |
|
options |
Sequence(Sequence(Text)) |
(None, None) |
string |
|
questions |
Sequence(Text) |
(None,) |
string |
|
@article{lai2017large,
title={RACE: Large-scale ReAding Comprehension Dataset From Examinations},
author={Lai, Guokun and Xie, Qizhe and Liu, Hanxiao and Yang, Yiming and Hovy, Eduard},
journal={arXiv preprint arXiv:1704.04683},
year={2017}
}
race/high (default config)
Dataset size: 52.39 MiB
Splits:
Split |
Examples |
'dev' |
1,021 |
'test' |
1,045 |
'train' |
18,728 |
race/middle
Dataset size: 12.51 MiB
Splits:
Split |
Examples |
'dev' |
368 |
'test' |
362 |
'train' |
6,409 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-20 UTC.
[null,null,["Last updated 2022-12-20 UTC."],[],[],null,["# race\n\n\u003cbr /\u003e\n\n- **Description**:\n\nRace is a large-scale reading comprehension dataset with more than 28,000\npassages and nearly 100,000 questions. The dataset is collected from English\nexaminations in China, which are designed for middle school and high school\nstudents. The dataset can be served as the training and test sets for machine\ncomprehension.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/race)\n\n- **Config description**: Builder config for RACE dataset.\n\n- **Homepage** :\n [https://www.cs.cmu.edu/\\~glai1/data/race/](https://www.cs.cmu.edu/%7Eglai1/data/race/)\n\n- **Source code** :\n [`tfds.datasets.race.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/race/race_dataset_builder.py)\n\n- **Versions**:\n\n - `1.0.0`: Initial release.\n - **`2.0.0`** (default): Add the example id.\n- **Download size** : `24.26 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Feature structure**:\n\n FeaturesDict({\n 'answers': Sequence(Text(shape=(), dtype=string)),\n 'article': Text(shape=(), dtype=string),\n 'example_id': Text(shape=(), dtype=string),\n 'options': Sequence(Sequence(Text(shape=(), dtype=string))),\n 'questions': Sequence(Text(shape=(), dtype=string)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------------------|--------------|--------|-------------|\n| | FeaturesDict | | | |\n| answers | Sequence(Text) | (None,) | string | |\n| article | Text | | string | |\n| example_id | Text | | string | |\n| options | Sequence(Sequence(Text)) | (None, None) | string | |\n| questions | Sequence(Text) | (None,) | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{lai2017large,\n title={RACE: Large-scale ReAding Comprehension Dataset From Examinations},\n author={Lai, Guokun and Xie, Qizhe and Liu, Hanxiao and Yang, Yiming and Hovy, Eduard},\n journal={arXiv preprint arXiv:1704.04683},\n year={2017}\n }\n\nrace/high (default config)\n--------------------------\n\n- **Dataset size** : `52.39 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'dev'` | 1,021 |\n| `'test'` | 1,045 |\n| `'train'` | 18,728 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nrace/middle\n-----------\n\n- **Dataset size** : `12.51 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'dev'` | 368 |\n| `'test'` | 362 |\n| `'train'` | 6,409 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]