- Description:
A new dataset of 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. We are also including a corpus of over 14 million science sentences relevant to the task, and an implementation of three neural baseline models for this dataset. We pose ARC as a challenge to the community.
- Homepage: https://allenai.org/data/arc 
- Source code: - tfds.datasets.ai2_arc.Builder
- Versions: - 1.0.0(default): No release notes.
 
- Download size: - 649.30 MiB
- Auto-cached (documentation): Yes 
- Feature structure: 
FeaturesDict({
    'answerKey': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'choices': Sequence({
        'label': ClassLabel(shape=(), dtype=int64, num_classes=5),
        'text': Text(shape=(), dtype=string),
    }),
    'id': Text(shape=(), dtype=string),
    'question': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description | 
|---|---|---|---|---|
| FeaturesDict | ||||
| answerKey | ClassLabel | int64 | ||
| choices | Sequence | |||
| choices/label | ClassLabel | int64 | ||
| choices/text | Text | string | ||
| id | Text | string | ||
| question | Text | string | 
- Supervised keys (See - as_superviseddoc):- None
- Figure (tfds.show_examples): Not supported. 
- Citation: 
@article{allenai:arc,
      author    = {Peter Clark  and Isaac Cowhey and Oren Etzioni and Tushar Khot and
                    Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
      title     = {Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
      journal   = {arXiv:1803.05457v1},
      year      = {2018},
}
ai2_arc/ARC-Challenge (default config)
- Config description: Challenge Set of 2590 "hard" questions (those that both a retrieval and a co-occurrence method fail to answer correctly) 
- Dataset size: - 939.91 KiB
- Splits: 
| Split | Examples | 
|---|---|
| 'test' | 1,172 | 
| 'train' | 1,119 | 
| 'validation' | 299 | 
- Examples (tfds.as_dataframe):
ai2_arc/ARC-Easy
- Config description: Easy Set of 5197 questions for the ARC Challenge. 
- Dataset size: - 1.63 MiB
- Splits: 
| Split | Examples | 
|---|---|
| 'test' | 2,376 | 
| 'train' | 2,251 | 
| 'validation' | 570 | 
- Examples (tfds.as_dataframe):