trivia_qa
Stay organized with collections
Save and categorize content based on your preferences.
TriviaqQA is a reading comprehension dataset containing over 650K
question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs
authored by trivia enthusiasts and independently gathered evidence documents,
six per question on average, that provide high quality distant supervision for
answering the questions.
FeaturesDict({
'answer': FeaturesDict({
'aliases': Sequence(Text(shape=(), dtype=string)),
'matched_wiki_entity_name': Text(shape=(), dtype=string),
'normalized_aliases': Sequence(Text(shape=(), dtype=string)),
'normalized_matched_wiki_entity_name': Text(shape=(), dtype=string),
'normalized_value': Text(shape=(), dtype=string),
'type': Text(shape=(), dtype=string),
'value': Text(shape=(), dtype=string),
}),
'entity_pages': Sequence({
'doc_source': Text(shape=(), dtype=string),
'filename': Text(shape=(), dtype=string),
'title': Text(shape=(), dtype=string),
'wiki_context': Text(shape=(), dtype=string),
}),
'question': Text(shape=(), dtype=string),
'question_id': Text(shape=(), dtype=string),
'question_source': Text(shape=(), dtype=string),
'search_results': Sequence({
'description': Text(shape=(), dtype=string),
'filename': Text(shape=(), dtype=string),
'rank': int32,
'search_context': Text(shape=(), dtype=string),
'title': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
answer |
FeaturesDict |
|
|
|
answer/aliases |
Sequence(Text) |
(None,) |
string |
|
answer/matched_wiki_entity_name |
Text |
|
string |
|
answer/normalized_aliases |
Sequence(Text) |
(None,) |
string |
|
answer/normalized_matched_wiki_entity_name |
Text |
|
string |
|
answer/normalized_value |
Text |
|
string |
|
answer/type |
Text |
|
string |
|
answer/value |
Text |
|
string |
|
entity_pages |
Sequence |
|
|
|
entity_pages/doc_source |
Text |
|
string |
|
entity_pages/filename |
Text |
|
string |
|
entity_pages/title |
Text |
|
string |
|
entity_pages/wiki_context |
Text |
|
string |
|
question |
Text |
|
string |
|
question_id |
Text |
|
string |
|
question_source |
Text |
|
string |
|
search_results |
Sequence |
|
|
|
search_results/description |
Text |
|
string |
|
search_results/filename |
Text |
|
string |
|
search_results/rank |
Tensor |
|
int32 |
|
search_results/search_context |
Text |
|
string |
|
search_results/title |
Text |
|
string |
|
search_results/url |
Text |
|
string |
|
@article{2017arXivtriviaqa,
author = { {Joshi}, Mandar and {Choi}, Eunsol and {Weld},
Daniel and {Zettlemoyer}, Luke},
title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
journal = {arXiv e-prints},
year = 2017,
eid = {arXiv:1705.03551},
pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
eprint = {1705.03551},
}
trivia_qa/rc (default config)
Split |
Examples |
'test' |
17,210 |
'train' |
138,384 |
'validation' |
18,669 |
trivia_qa/rc.nocontext
Config description: Question-answer pairs where all documents for a
given question contain the answer string(s).
Download size: 2.48 GiB
Dataset size: 196.84 MiB
Auto-cached
(documentation):
Yes (test, validation), Only when shuffle_files=False
(train)
Splits:
Split |
Examples |
'test' |
17,210 |
'train' |
138,384 |
'validation' |
18,669 |
trivia_qa/unfiltered
Config description: 110k question-answer pairs for open domain QA where
not all documents for a given question contain the answer string(s). This
makes the unfiltered dataset more appropriate for IR-style QA. Includes
context from Wikipedia and search results.
Download size: 3.07 GiB
Dataset size: 27.27 GiB
Auto-cached
(documentation):
No
Splits:
Split |
Examples |
'test' |
10,832 |
'train' |
87,622 |
'validation' |
11,313 |
trivia_qa/unfiltered.nocontext
Config description: 110k question-answer pairs for open domain QA where
not all documents for a given question contain the answer string(s). This
makes the unfiltered dataset more appropriate for IR-style QA.
Download size: 603.25 MiB
Dataset size: 119.78 MiB
Auto-cached
(documentation):
Yes
Splits:
Split |
Examples |
'test' |
10,832 |
'train' |
87,622 |
'validation' |
11,313 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-02-12 UTC.
[null,null,["Last updated 2023-02-12 UTC."],[],[],null,["# trivia_qa\n\n\u003cbr /\u003e\n\n- **Description**:\n\nTriviaqQA is a reading comprehension dataset containing over 650K\nquestion-answer-evidence triples. TriviaqQA includes 95K question-answer pairs\nauthored by trivia enthusiasts and independently gathered evidence documents,\nsix per question on average, that provide high quality distant supervision for\nanswering the questions.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/triviaqa)\n\n- **Homepage** :\n \u003chttp://nlp.cs.washington.edu/triviaqa/\u003e\n\n- **Source code** :\n [`tfds.datasets.trivia_qa.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/trivia_qa/trivia_qa_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.1.0`** (default): No release notes.\n- **Feature structure**:\n\n FeaturesDict({\n 'answer': FeaturesDict({\n 'aliases': Sequence(Text(shape=(), dtype=string)),\n 'matched_wiki_entity_name': Text(shape=(), dtype=string),\n 'normalized_aliases': Sequence(Text(shape=(), dtype=string)),\n 'normalized_matched_wiki_entity_name': Text(shape=(), dtype=string),\n 'normalized_value': Text(shape=(), dtype=string),\n 'type': Text(shape=(), dtype=string),\n 'value': Text(shape=(), dtype=string),\n }),\n 'entity_pages': Sequence({\n 'doc_source': Text(shape=(), dtype=string),\n 'filename': Text(shape=(), dtype=string),\n 'title': Text(shape=(), dtype=string),\n 'wiki_context': Text(shape=(), dtype=string),\n }),\n 'question': Text(shape=(), dtype=string),\n 'question_id': Text(shape=(), dtype=string),\n 'question_source': Text(shape=(), dtype=string),\n 'search_results': Sequence({\n 'description': Text(shape=(), dtype=string),\n 'filename': Text(shape=(), dtype=string),\n 'rank': int32,\n 'search_context': Text(shape=(), dtype=string),\n 'title': Text(shape=(), dtype=string),\n 'url': Text(shape=(), dtype=string),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------------------|----------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| answer | FeaturesDict | | | |\n| answer/aliases | Sequence(Text) | (None,) | string | |\n| answer/matched_wiki_entity_name | Text | | string | |\n| answer/normalized_aliases | Sequence(Text) | (None,) | string | |\n| answer/normalized_matched_wiki_entity_name | Text | | string | |\n| answer/normalized_value | Text | | string | |\n| answer/type | Text | | string | |\n| answer/value | Text | | string | |\n| entity_pages | Sequence | | | |\n| entity_pages/doc_source | Text | | string | |\n| entity_pages/filename | Text | | string | |\n| entity_pages/title | Text | | string | |\n| entity_pages/wiki_context | Text | | string | |\n| question | Text | | string | |\n| question_id | Text | | string | |\n| question_source | Text | | string | |\n| search_results | Sequence | | | |\n| search_results/description | Text | | string | |\n| search_results/filename | Text | | string | |\n| search_results/rank | Tensor | | int32 | |\n| search_results/search_context | Text | | string | |\n| search_results/title | Text | | string | |\n| search_results/url | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{2017arXivtriviaqa,\n author = { {Joshi}, Mandar and {Choi}, Eunsol and {Weld},\n Daniel and {Zettlemoyer}, Luke},\n title = \"{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}\",\n journal = {arXiv e-prints},\n year = 2017,\n eid = {arXiv:1705.03551},\n pages = {arXiv:1705.03551},\n archivePrefix = {arXiv},\n eprint = {1705.03551},\n }\n\ntrivia_qa/rc (default config)\n-----------------------------\n\n- **Config description**: Question-answer pairs where all documents for a\n given question contain the answer string(s). Includes context from Wikipedia\n and search results.\n\n- **Download size** : `2.48 GiB`\n\n- **Dataset size** : `14.99 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 17,210 |\n| `'train'` | 138,384 |\n| `'validation'` | 18,669 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\ntrivia_qa/rc.nocontext\n----------------------\n\n- **Config description**: Question-answer pairs where all documents for a\n given question contain the answer string(s).\n\n- **Download size** : `2.48 GiB`\n\n- **Dataset size** : `196.84 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes (test, validation), Only when `shuffle_files=False` (train)\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 17,210 |\n| `'train'` | 138,384 |\n| `'validation'` | 18,669 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\ntrivia_qa/unfiltered\n--------------------\n\n- **Config description**: 110k question-answer pairs for open domain QA where\n not all documents for a given question contain the answer string(s). This\n makes the unfiltered dataset more appropriate for IR-style QA. Includes\n context from Wikipedia and search results.\n\n- **Download size** : `3.07 GiB`\n\n- **Dataset size** : `27.27 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 10,832 |\n| `'train'` | 87,622 |\n| `'validation'` | 11,313 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\ntrivia_qa/unfiltered.nocontext\n------------------------------\n\n- **Config description**: 110k question-answer pairs for open domain QA where\n not all documents for a given question contain the answer string(s). This\n makes the unfiltered dataset more appropriate for IR-style QA.\n\n- **Download size** : `603.25 MiB`\n\n- **Dataset size** : `119.78 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 10,832 |\n| `'train'` | 87,622 |\n| `'validation'` | 11,313 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]