- Description:
Race is a large-scale reading comprehension dataset with more than 28,000 passages and nearly 100,000 questions. The dataset is collected from English examinations in China, which are designed for middle school and high school students. The dataset can be served as the training and test sets for machine comprehension.
Additional Documentation: Explore on Papers With Code
Config description: Builder config for RACE dataset.
Homepage: https://www.cs.cmu.edu/~glai1/data/race/
Source code:
tfds.datasets.race.Builder
Versions:
1.0.0
: Initial release.2.0.0
(default): Add the example id.
Download size:
24.26 MiB
Auto-cached (documentation): Yes
Feature structure:
FeaturesDict({
'answers': Sequence(Text(shape=(), dtype=string)),
'article': Text(shape=(), dtype=string),
'example_id': Text(shape=(), dtype=string),
'options': Sequence(Sequence(Text(shape=(), dtype=string))),
'questions': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
answers | Sequence(Text) | (None,) | string | |
article | Text | string | ||
example_id | Text | string | ||
options | Sequence(Sequence(Text)) | (None, None) | string | |
questions | Sequence(Text) | (None,) | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@article{lai2017large,
title={RACE: Large-scale ReAding Comprehension Dataset From Examinations},
author={Lai, Guokun and Xie, Qizhe and Liu, Hanxiao and Yang, Yiming and Hovy, Eduard},
journal={arXiv preprint arXiv:1704.04683},
year={2017}
}
race/high (default config)
Dataset size:
52.39 MiB
Splits:
Split | Examples |
---|---|
'dev' |
1,021 |
'test' |
1,045 |
'train' |
18,728 |
- Examples (tfds.as_dataframe):
race/middle
Dataset size:
12.51 MiB
Splits:
Split | Examples |
---|---|
'dev' |
368 |
'test' |
362 |
'train' |
6,409 |
- Examples (tfds.as_dataframe):