Google I/O is a wrap! Catch up on TensorFlow sessions View sessions

quality

  • Description:

QuALITY, a multiple-choice, long-reading comprehension dataset.

We provide only the raw version.

Split Examples
'dev' 230
'test' 232
'train' 300
  • Feature structure:
FeaturesDict({
    'article': Text(shape=(), dtype=tf.string),
    'article_id': Text(shape=(), dtype=tf.string),
    'difficults': Sequence(tf.bool),
    'gold_labels': Sequence(tf.int32),
    'options': Sequence(Sequence(Text(shape=(), dtype=tf.string))),
    'question_ids': Sequence(Text(shape=(), dtype=tf.string)),
    'questions': Sequence(Text(shape=(), dtype=tf.string)),
    'set_unique_id': Text(shape=(), dtype=tf.string),
    'source': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
    'topic': Text(shape=(), dtype=tf.string),
    'url': Text(shape=(), dtype=tf.string),
    'writer_id': Text(shape=(), dtype=tf.string),
    'writer_labels': Sequence(tf.int32),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
article Text tf.string
article_id Text tf.string
difficults Sequence(Tensor) (None,) tf.bool
gold_labels Sequence(Tensor) (None,) tf.int32
options Sequence(Sequence(Text)) (None, None) tf.string
question_ids Sequence(Text) (None,) tf.string
questions Sequence(Text) (None,) tf.string
set_unique_id Text tf.string
source Text tf.string
title Text tf.string
topic Text tf.string
url Text tf.string
writer_id Text tf.string
writer_labels Sequence(Tensor) (None,) tf.int32
@article{pang2021quality,
  title={ {QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

quality/raw (default config)

  • Config description: Raw with HTML.

  • Dataset size: 22.18 MiB

  • Examples (tfds.as_dataframe):

quality/stripped

  • Config description: Stripped of HTML.

  • Dataset size: 20.73 MiB

  • Examples (tfds.as_dataframe):