quality

  • Description:

QuALITY, a multiple-choice, long-reading comprehension dataset.

We provide only the raw version.

Split Examples
'dev' 230
'test' 232
'train' 300
  • Feature structure:
FeaturesDict({
    'article': Text(shape=(), dtype=string),
    'article_id': Text(shape=(), dtype=string),
    'difficults': Sequence(bool),
    'gold_labels': Sequence(int32),
    'options': Sequence(Sequence(Text(shape=(), dtype=string))),
    'question_ids': Sequence(Text(shape=(), dtype=string)),
    'questions': Sequence(Text(shape=(), dtype=string)),
    'set_unique_id': Text(shape=(), dtype=string),
    'source': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'topic': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
    'writer_id': Text(shape=(), dtype=string),
    'writer_labels': Sequence(int32),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
article Text string
article_id Text string
difficults Sequence(Tensor) (None,) bool
gold_labels Sequence(Tensor) (None,) int32
options Sequence(Sequence(Text)) (None, None) string
question_ids Sequence(Text) (None,) string
questions Sequence(Text) (None,) string
set_unique_id Text string
source Text string
title Text string
topic Text string
url Text string
writer_id Text string
writer_labels Sequence(Tensor) (None,) int32
@article{pang2021quality,
  title={ {QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

quality/raw (default config)

  • Config description: Raw with HTML.

  • Dataset size: 22.18 MiB

  • Examples (tfds.as_dataframe):

quality/stripped

  • Config description: Stripped of HTML.

  • Dataset size: 20.73 MiB

  • Examples (tfds.as_dataframe):