- Description:
QuALITY, a multiple-choice, long-reading comprehension dataset.
We provide only the raw version.
Homepage: https://github.com/nyu-mll/quality
Source code:
tfds.datasets.quality.BuilderVersions:
1.0.0(default): Initial release.
Download size:
17.26 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples |
|---|---|
'dev' |
230 |
'test' |
232 |
'train' |
300 |
- Feature structure:
FeaturesDict({
'article': Text(shape=(), dtype=string),
'article_id': Text(shape=(), dtype=string),
'difficults': Sequence(bool),
'gold_labels': Sequence(int32),
'options': Sequence(Sequence(Text(shape=(), dtype=string))),
'question_ids': Sequence(Text(shape=(), dtype=string)),
'questions': Sequence(Text(shape=(), dtype=string)),
'set_unique_id': Text(shape=(), dtype=string),
'source': Text(shape=(), dtype=string),
'title': Text(shape=(), dtype=string),
'topic': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
'writer_id': Text(shape=(), dtype=string),
'writer_labels': Sequence(int32),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| article | Text | string | ||
| article_id | Text | string | ||
| difficults | Sequence(Tensor) | (None,) | bool | |
| gold_labels | Sequence(Tensor) | (None,) | int32 | |
| options | Sequence(Sequence(Text)) | (None, None) | string | |
| question_ids | Sequence(Text) | (None,) | string | |
| questions | Sequence(Text) | (None,) | string | |
| set_unique_id | Text | string | ||
| source | Text | string | ||
| title | Text | string | ||
| topic | Text | string | ||
| url | Text | string | ||
| writer_id | Text | string | ||
| writer_labels | Sequence(Tensor) | (None,) | int32 |
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Citation:
@article{pang2021quality,
title={ {QuALITY}: Question Answering with Long Input Texts, Yes!},
author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
journal={arXiv preprint arXiv:2112.08608},
year={2021}
}
quality/raw (default config)
Config description: Raw with HTML.
Dataset size:
22.18 MiBExamples (tfds.as_dataframe):
quality/stripped
Config description: Stripped of HTML.
Dataset size:
20.73 MiBExamples (tfds.as_dataframe):