unified_qa

  • Description:

The UnifiedQA benchmark consists of 20 main question answering (QA) datasets (each may have multiple versions) that target different formats as well as various complex linguistic phenomena. These datasets are grouped into several formats/categories, including: extractive QA, abstractive QA, multiple-choice QA, and yes/no QA. Additionally, contrast sets are used for several datasets (denoted with "contrastsets"). These evaluation sets are expert-generated perturbations that deviate from the patterns common in the original dataset. For several datasets that do not come with evidence paragraphs, two variants are included: one where the datasets are used as-is and another that uses paragraphs fetched via an information retrieval system as additional evidence, indicated with "_ir" tags.

More information can be found at: https://github.com/allenai/unifiedqa

FeaturesDict({
    'input': string,
    'output': string,
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
input Tensor string
output Tensor string

unified_qa/ai2_science_elementary (default config)

  • Config description: The AI2 Science Questions dataset consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple choice format and may or may not include a diagram element. This set consists of questions used for elementary school grade levels.

  • Download size: 345.59 KiB

  • Dataset size: 390.02 KiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 542
'train' 623
'validation' 123
  • Citation:
http://data.allenai.org/ai2-science-questions

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/ai2_science_middle

  • Config description: The AI2 Science Questions dataset consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple choice format and may or may not include a diagram element. This set consists of questions used for middle school grade levels.

  • Download size: 428.41 KiB

  • Dataset size: 477.40 KiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 679
'train' 605
'validation' 125
  • Citation:
http://data.allenai.org/ai2-science-questions

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/ambigqa

  • Config description: AmbigQA is an open-domain question answering task which involves finding every plausible answer, and then rewriting the question for each one to resolve the ambiguity.

  • Download size: 2.27 MiB

  • Dataset size: 3.04 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 19,806
'validation' 5,674
  • Citation:
@inproceedings{min-etal-2020-ambigqa,
    title = "{A}mbig{QA}: Answering Ambiguous Open-domain Questions",
    author = "Min, Sewon  and
      Michael, Julian  and
      Hajishirzi, Hannaneh  and
      Zettlemoyer, Luke",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.466",
    doi = "10.18653/v1/2020.emnlp-main.466",
    pages = "5783--5797",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "easy" questions.

  • Download size: 1.24 MiB

  • Dataset size: 1.42 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 2,376
'train' 2,251
'validation' 570
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy_dev

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "easy" questions.

  • Download size: 1.24 MiB

  • Dataset size: 1.42 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 2,376
'train' 2,251
'validation' 570
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy_with_ir

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "easy" questions. This version includes paragraphs fetched via an information retrieval system as additional evidence.

  • Download size: 7.00 MiB

  • Dataset size: 7.17 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 2,376
'train' 2,251
'validation' 570
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy_with_ir_dev

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "easy" questions. This version includes paragraphs fetched via an information retrieval system as additional evidence.

  • Download size: 7.00 MiB

  • Dataset size: 7.17 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 2,376
'train' 2,251
'validation' 570
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "hard" questions.

  • Download size: 758.03 KiB

  • Dataset size: 848.28 KiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,172
'train' 1,119
'validation' 299
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard_dev

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "hard" questions.

  • Download size: 758.03 KiB

  • Dataset size: 848.28 KiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,172
'train' 1,119
'validation' 299
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard_with_ir

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "hard" questions. This version includes paragraphs fetched via an information retrieval system as additional evidence.

  • Download size: 3.53 MiB

  • Dataset size: 3.62 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,172
'train' 1,119
'validation' 299
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard_with_ir_dev

  • Config description: This dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. This set consists of "hard" questions. This version includes paragraphs fetched via an information retrieval system as additional evidence.

  • Download size: 3.53 MiB

  • Dataset size: 3.62 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,172
'train' 1,119
'validation' 299
  • Citation:
@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/boolq

  • Config description: BoolQ is a question answering dataset for yes/no questions. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.

  • Download size: 7.77 MiB

  • Dataset size: 8.20 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 9,427
'validation' 3,270
  • Citation:
@inproceedings{clark-etal-2019-boolq,
    title = "{B}ool{Q}: Exploring the Surprising Difficulty of Natural Yes/No Questions",
    author = "Clark, Christopher  and
      Lee, Kenton  and
      Chang, Ming-Wei  and
      Kwiatkowski, Tom  and
      Collins, Michael  and
      Toutanova, Kristina",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1300",
    doi = "10.18653/v1/N19-1300",
    pages = "2924--2936",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/boolq_np

  • Config description: BoolQ is a question answering dataset for yes/no questions. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks. This version adds natural perturbations to the original version.

  • Download size: 10.80 MiB

  • Dataset size: 11.40 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 9,727
'validation' 7,596
  • Citation:
@inproceedings{khashabi-etal-2020-bang,
    title = "More Bang for Your Buck: Natural Perturbation for Robust Question Answering",
    author = "Khashabi, Daniel  and
      Khot, Tushar  and
      Sabharwal, Ashish",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.12",
    doi = "10.18653/v1/2020.emnlp-main.12",
    pages = "163--170",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/commonsenseqa

  • Config description: CommonsenseQA is a new multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers . It contains questions with one correct answer and four distractor answers.

  • Download size: 1.79 MiB

  • Dataset size: 2.19 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,140
'train' 9,741
'validation' 1,221
  • Citation:
@inproceedings{talmor-etal-2019-commonsenseqa,
    title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge",
    author = "Talmor, Alon  and
      Herzig, Jonathan  and
      Lourie, Nicholas  and
      Berant, Jonathan",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1421",
    doi = "10.18653/v1/N19-1421",
    pages = "4149--4158",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/commonsenseqa_test

  • Config description: CommonsenseQA is a new multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers . It contains questions with one correct answer and four distractor answers.

  • Download size: 1.79 MiB

  • Dataset size: 2.19 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,140
'train' 9,741
'validation' 1,221
  • Citation:
@inproceedings{talmor-etal-2019-commonsenseqa,
    title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge",
    author = "Talmor, Alon  and
      Herzig, Jonathan  and
      Lourie, Nicholas  and
      Berant, Jonathan",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1421",
    doi = "10.18653/v1/N19-1421",
    pages = "4149--4158",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_boolq

  • Config description: BoolQ is a question answering dataset for yes/no questions. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks. This version uses contrast sets. These evaluation sets are expert-generated perturbations that deviate from the patterns common in the original dataset.

  • Download size: 438.51 KiB

  • Dataset size: 462.35 KiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 340
'validation' 340
  • Citation:
@inproceedings{clark-etal-2019-boolq,
    title = "{B}ool{Q}: Exploring the Surprising Difficulty of Natural Yes/No Questions",
    author = "Clark, Christopher  and
      Lee, Kenton  and
      Chang, Ming-Wei  and
      Kwiatkowski, Tom  and
      Collins, Michael  and
      Toutanova, Kristina",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1300",
    doi = "10.18653/v1/N19-1300",
    pages = "2924--2936",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_drop

  • Config description: DROP is a crowdsourced, adversarially-created QA benchmark, in which a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. This version uses contrast sets. These evaluation sets are expert-generated perturbations that deviate from the patterns common in the original dataset.

  • Download size: 2.20 MiB

  • Dataset size: 2.26 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 947
'validation' 947
  • Citation:
@inproceedings{dua-etal-2019-drop,
    title = "{DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs",
    author = "Dua, Dheeru  and
      Wang, Yizhong  and
      Dasigi, Pradeep  and
      Stanovsky, Gabriel  and
      Singh, Sameer  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1246",
    doi = "10.18653/v1/N19-1246",
    pages = "2368--2378",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_quoref

  • Config description: This dataset tests the coreferential reasoning capability of reading comprehension systems. In this span-selection benchmark containing questions over paragraphs from Wikipedia, a system must resolve hard coreferences before selecting the appropriate span(s) in the paragraphs for answering questions. This version uses contrast sets. These evaluation sets are expert-generated perturbations that deviate from the patterns common in the original dataset.

  • Download size: 2.60 MiB

  • Dataset size: 2.65 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 700
'validation' 700
  • Citation:
@inproceedings{dasigi-etal-2019-quoref,
    title = "{Q}uoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning",
    author = "Dasigi, Pradeep  and
      Liu, Nelson F.  and
      Marasovi{'c}, Ana  and
      Smith, Noah A.  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-1606",
    doi = "10.18653/v1/D19-1606",
    pages = "5925--5932",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_ropes

  • Config description: This dataset tests a system's ability to apply knowledge from a passage of text to a new situation. A system is presented a background passage containing a causal or qualitative relation(s) (e.g., "animal pollinators increase efficiency of fertilization in flowers"), a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation. This version uses contrast sets. These evaluation sets are expert-generated perturbations that deviate from the patterns common in the original dataset.

  • Download size: 1.97 MiB

  • Dataset size: 2.04 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 974
'validation' 974
  • Citation:
@inproceedings{lin-etal-2019-reasoning,
    title = "Reasoning Over Paragraph Effects in Situations",
    author = "Lin, Kevin  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5808",
    doi = "10.18653/v1/D19-5808",
    pages = "58--62",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/drop

  • Config description: DROP is a crowdsourced, adversarially-created QA benchmark, in which a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets.

  • Download size: 105.18 MiB

  • Dataset size: 108.16 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 77,399
'validation' 9,536
  • Citation:
@inproceedings{dua-etal-2019-drop,
    title = "{DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs",
    author = "Dua, Dheeru  and
      Wang, Yizhong  and
      Dasigi, Pradeep  and
      Stanovsky, Gabriel  and
      Singh, Sameer  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1246",
    doi = "10.18653/v1/N19-1246",
    pages = "2368--2378",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/mctest

  • Config description: MCTest requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension. Reading comprehension can test advanced abilities such as causal reasoning and understanding the world, yet, by being multiple-choice, still provide a clear metric. By being fictional, the answer typically can be found only in the story itself. The stories and questions are also carefully limited to those a young child would understand, reducing the world knowledge that is required for the task.

  • Download size: 2.14 MiB

  • Dataset size: 2.20 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 1,480
'validation' 320
  • Citation:
@inproceedings{richardson-etal-2013-mctest,
    title = "{MCT}est: A Challenge Dataset for the Open-Domain Machine Comprehension of Text",
    author = "Richardson, Matthew  and
      Burges, Christopher J.C.  and
      Renshaw, Erin",
    booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",
    month = oct,
    year = "2013",
    address = "Seattle, Washington, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D13-1020",
    pages = "193--203",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/mctest_corrected_the_separator

  • Config description: MCTest requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension. Reading comprehension can test advanced abilities such as causal reasoning and understanding the world, yet, by being multiple-choice, still provide a clear metric. By being fictional, the answer typically can be found only in the story itself. The stories and questions are also carefully limited to those a young child would understand, reducing the world knowledge that is required for the task.

  • Download size: 2.15 MiB

  • Dataset size: 2.21 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 1,480
'validation' 320
  • Citation:
@inproceedings{richardson-etal-2013-mctest,
    title = "{MCT}est: A Challenge Dataset for the Open-Domain Machine Comprehension of Text",
    author = "Richardson, Matthew  and
      Burges, Christopher J.C.  and
      Renshaw, Erin",
    booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",
    month = oct,
    year = "2013",
    address = "Seattle, Washington, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D13-1020",
    pages = "193--203",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/multirc

  • Config description: MultiRC is a reading comprehension challenge in which questions can only be answered by taking into account information from multiple sentences. Questions and answers for this challenge were solicited and verified through a 4-step crowdsourcing experiment. The dataset contains questions for paragraphs across 7 different domains ( elementary school science, news, travel guides, fiction stories, etc) bringing in linguistic diversity to the texts and to the questions wordings.

  • Download size: 897.09 KiB

  • Dataset size: 918.42 KiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 312
'validation' 312
  • Citation:
@inproceedings{khashabi-etal-2018-looking,
    title = "Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences",
    author = "Khashabi, Daniel  and
      Chaturvedi, Snigdha  and
      Roth, Michael  and
      Upadhyay, Shyam  and
      Roth, Dan",
    booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)",
    month = jun,
    year = "2018",
    address = "New Orleans, Louisiana",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N18-1023",
    doi = "10.18653/v1/N18-1023",
    pages = "252--262",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/narrativeqa

  • Config description: NarrativeQA is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents.

  • Download size: 308.28 MiB

  • Dataset size: 311.22 MiB

  • Auto-cached (documentation): No

  • Splits:

Split Examples
'test' 21,114
'train' 65,494
'validation' 6,922
  • Citation:
@article{kocisky-etal-2018-narrativeqa,
    title = "The {N}arrative{QA} Reading Comprehension Challenge",
    author = "Ko{
{c} }isk{'y}, Tom{'a}{
{s} }  and
      Schwarz, Jonathan  and
      Blunsom, Phil  and
      Dyer, Chris  and
      Hermann, Karl Moritz  and
      Melis, G{'a}bor  and
      Grefenstette, Edward",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "6",
    year = "2018",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q18-1023",
    doi = "10.1162/tacl_a_00023",
    pages = "317--328",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/narrativeqa_dev

  • Config description: NarrativeQA is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents.

  • Download size: 308.28 MiB

  • Dataset size: 311.22 MiB

  • Auto-cached (documentation): No

  • Splits:

Split Examples
'test' 21,114
'train' 65,494
'validation' 6,922
  • Citation:
@article{kocisky-etal-2018-narrativeqa,
    title = "The {N}arrative{QA} Reading Comprehension Challenge",
    author = "Ko{
{c} }isk{'y}, Tom{'a}{
{s} }  and
      Schwarz, Jonathan  and
      Blunsom, Phil  and
      Dyer, Chris  and
      Hermann, Karl Moritz  and
      Melis, G{'a}bor  and
      Grefenstette, Edward",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "6",
    year = "2018",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q18-1023",
    doi = "10.1162/tacl_a_00023",
    pages = "317--328",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions

  • Config description: The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets.

  • Download size: 6.95 MiB

  • Dataset size: 9.88 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 96,075
'validation' 2,295
  • Citation:
@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_direct_ans

  • Config description: The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets. This version consists of direct-answer questions.

  • Download size: 6.82 MiB

  • Dataset size: 10.19 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 6,468
'train' 96,676
'validation' 10,693
  • Citation:
@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_direct_ans_test

  • Config description: The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets. This version consists of direct-answer questions.

  • Download size: 6.82 MiB

  • Dataset size: 10.19 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 6,468
'train' 96,676
'validation' 10,693
  • Citation:
@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_with_dpr_para

  • Config description: The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets. This version includes additional paragraphs (obtained using the DPR retrieval engine) to augment each question.

  • Download size: 319.22 MiB

  • Dataset size: 322.91 MiB

  • Auto-cached (documentation): No

  • Splits:

Split Examples
'train' 96,676
'validation' 10,693
  • Citation:
@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_with_dpr_para_test

  • Config description: The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets. This version includes additional paragraphs (obtained using the DPR retrieval engine) to augment each question.

  • Download size: 306.94 MiB

  • Dataset size: 310.48 MiB

  • Auto-cached (documentation): No

  • Splits:

Split Examples
'test' 6,468
'train' 96,676
  • Citation:
@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal<