ai2_arc_with_ir
Stay organized with collections
Save and categorize content based on your preferences.
A new dataset of 7,787 genuine grade-school level, multiple-choice science
questions, assembled to encourage research in advanced question-answering. The
dataset is partitioned into a Challenge Set and an Easy Set, where the former
contains only questions answered incorrectly by both a retrieval-based algorithm
and a word co-occurrence algorithm. We are also including a corpus of over 14
million science sentences relevant to the task, and an implementation of three
neural baseline models for this dataset. We pose ARC as a challenge to the
community.
Compared to the original dataset, this adds context sentences obtained through
information retrieval in the same way as UnifiedQA (see:
https://arxiv.org/abs/2005.00700 ).
FeaturesDict({
'answerKey': ClassLabel(shape=(), dtype=int64, num_classes=5),
'choices': Sequence({
'label': ClassLabel(shape=(), dtype=int64, num_classes=5),
'text': Text(shape=(), dtype=string),
}),
'id': Text(shape=(), dtype=string),
'paragraph': Text(shape=(), dtype=string),
'question': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
answerKey |
ClassLabel |
|
int64 |
|
choices |
Sequence |
|
|
|
choices/label |
ClassLabel |
|
int64 |
|
choices/text |
Text |
|
string |
|
id |
Text |
|
string |
|
paragraph |
Text |
|
string |
|
question |
Text |
|
string |
|
@article{allenai:arc,
author = {Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and
Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
title = {Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
journal = {arXiv:1803.05457v1},
year = {2018},
}
@article{2020unifiedqa,
title={UnifiedQA: Crossing Format Boundaries With a Single QA System},
author={D. Khashabi and S. Min and T. Khot and A. Sabhwaral and O. Tafjord and P. Clark and H. Hajishirzi},
journal={arXiv preprint},
year={2020}
}
ai2_arc_with_ir/ARC-Challenge-IR (default config)
Split |
Examples |
'test' |
1,172 |
'train' |
1,119 |
'validation' |
299 |
ai2_arc_with_ir/ARC-Easy-IR
Split |
Examples |
'test' |
2,376 |
'train' |
2,251 |
'validation' |
570 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-11-29 UTC.
[null,null,["Last updated 2023-11-29 UTC."],[],[],null,["# ai2_arc_with_ir\n\n\u003cbr /\u003e\n\n- **Description**:\n\nA new dataset of 7,787 genuine grade-school level, multiple-choice science\nquestions, assembled to encourage research in advanced question-answering. The\ndataset is partitioned into a Challenge Set and an Easy Set, where the former\ncontains only questions answered incorrectly by both a retrieval-based algorithm\nand a word co-occurrence algorithm. We are also including a corpus of over 14\nmillion science sentences relevant to the task, and an implementation of three\nneural baseline models for this dataset. We pose ARC as a challenge to the\ncommunity.\n\nCompared to the original dataset, this adds context sentences obtained through\ninformation retrieval in the same way as UnifiedQA (see:\n\u003chttps://arxiv.org/abs/2005.00700\u003e ).\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/arc)\n\n- **Homepage** : \u003chttps://allenai.org/data/arc\u003e\n\n- **Source code** :\n [`tfds.datasets.ai2_arc_with_ir.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/ai2_arc_with_ir/ai2_arc_with_ir_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): No release notes.\n- **Download size** : `3.68 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Feature structure**:\n\n FeaturesDict({\n 'answerKey': ClassLabel(shape=(), dtype=int64, num_classes=5),\n 'choices': Sequence({\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=5),\n 'text': Text(shape=(), dtype=string),\n }),\n 'id': Text(shape=(), dtype=string),\n 'paragraph': Text(shape=(), dtype=string),\n 'question': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| answerKey | ClassLabel | | int64 | |\n| choices | Sequence | | | |\n| choices/label | ClassLabel | | int64 | |\n| choices/text | Text | | string | |\n| id | Text | | string | |\n| paragraph | Text | | string | |\n| question | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{allenai:arc,\n author = {Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and\n Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},\n title = {Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},\n journal = {arXiv:1803.05457v1},\n year = {2018},\n }\n @article{2020unifiedqa,\n title={UnifiedQA: Crossing Format Boundaries With a Single QA System},\n author={D. Khashabi and S. Min and T. Khot and A. Sabhwaral and O. Tafjord and P. Clark and H. Hajishirzi},\n journal={arXiv preprint},\n year={2020}\n }\n\nai2_arc_with_ir/ARC-Challenge-IR (default config)\n-------------------------------------------------\n\n- **Config description**: Challenge Set of 2590 \"hard\" questions (those that\n both a retrieval and a co-occurrence method fail to answer correctly)\n\n- **Dataset size** : `3.76 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 1,172 |\n| `'train'` | 1,119 |\n| `'validation'` | 299 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nai2_arc_with_ir/ARC-Easy-IR\n---------------------------\n\n- **Config description**: Easy Set of 5197 questions for the ARC Challenge.\n\n- **Dataset size** : `7.49 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 2,376 |\n| `'train'` | 2,251 |\n| `'validation'` | 570 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]