TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

quality

Description:

QuALITY, a multiple-choice, long-reading comprehension dataset.

We provide only the raw version.

Homepage: https://github.com/nyu-mll/quality
Source code: tfds.datasets.quality.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 17.26 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'dev'`	230
`'test'`	232
`'train'`	300

Feature structure:

FeaturesDict({
    'article': Text(shape=(), dtype=string),
    'article_id': Text(shape=(), dtype=string),
    'difficults': Sequence(bool),
    'gold_labels': Sequence(int32),
    'options': Sequence(Sequence(Text(shape=(), dtype=string))),
    'question_ids': Sequence(Text(shape=(), dtype=string)),
    'questions': Sequence(Text(shape=(), dtype=string)),
    'set_unique_id': Text(shape=(), dtype=string),
    'source': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'topic': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
    'writer_id': Text(shape=(), dtype=string),
    'writer_labels': Sequence(int32),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
article	Text		string
article_id	Text		string
difficults	Sequence(Tensor)	(None,)	bool
gold_labels	Sequence(Tensor)	(None,)	int32
options	Sequence(Sequence(Text))	(None, None)	string
question_ids	Sequence(Text)	(None,)	string
questions	Sequence(Text)	(None,)	string
set_unique_id	Text		string
source	Text		string
title	Text		string
topic	Text		string
url	Text		string
writer_id	Text		string
writer_labels	Sequence(Tensor)	(None,)	int32

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@article{pang2021quality,
  title={ {QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

quality/raw (default config)

Config description: Raw with HTML.
Dataset size: 22.18 MiB
Examples (tfds.as_dataframe):

quality/stripped

Config description: Stripped of HTML.
Dataset size: 20.73 MiB
Examples (tfds.as_dataframe):