Attend the Women in ML Symposium on December 7 Register now

quality

  • Description:

QuALITY, a multiple-choice, long-reading comprehension dataset.

We provide only the raw version.

Split Examples
'dev' 230
'test' 232
'train' 300
  • Feature structure:
FeaturesDict({
    'article': Text(shape=(), dtype=object),
    'article_id': Text(shape=(), dtype=object),
    'difficults': Sequence(bool),
    'gold_labels': Sequence(int32),
    'options': Sequence(Sequence(Text(shape=(), dtype=object))),
    'question_ids': Sequence(Text(shape=(), dtype=object)),
    'questions': Sequence(Text(shape=(), dtype=object)),
    'set_unique_id': Text(shape=(), dtype=object),
    'source': Text(shape=(), dtype=object),
    'title': Text(shape=(), dtype=object),
    'topic': Text(shape=(), dtype=object),
    'url': Text(shape=(), dtype=object),
    'writer_id': Text(shape=(), dtype=object),
    'writer_labels': Sequence(int32),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
article Text object
article_id Text object
difficults Sequence(Tensor) (None,) bool
gold_labels Sequence(Tensor) (None,) int32
options Sequence(Sequence(Text)) (None, None) object
question_ids Sequence(Text) (None,) object
questions Sequence(Text) (None,) object
set_unique_id Text object
source Text object
title Text object
topic Text object
url Text object
writer_id Text object
writer_labels Sequence(Tensor) (None,) int32
@article{pang2021quality,
  title={ {QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

quality/raw (default config)

  • Config description: Raw with HTML.

  • Dataset size: 22.18 MiB

  • Examples (tfds.as_dataframe):

quality/stripped

  • Config description: Stripped of HTML.

  • Dataset size: 20.73 MiB

  • Examples (tfds.as_dataframe):