answer_equivalence
Stay organized with collections
Save and categorize content based on your preferences.
The Answer Equivalence Dataset contains human ratings on model predictions from
several models on the SQuAD dataset. The ratings establish whether the predicted
answer is 'equivalent' to the gold answer (taking into account both question and
context).
More specifically, by 'equivalent' we mean that the predicted answer contains at
least the same information as the gold answer and does not add superfluous
information. The dataset contains annotations for: * predictions from BiDAF on
SQuAD dev * predictions from XLNet on SQuAD dev * predictions from Luke on SQuAD
dev * predictions from Albert on SQuAD training, dev and test examples
Split |
Examples |
'ae_dev' |
4,446 |
'ae_test' |
9,724 |
'dev_bidaf' |
7,522 |
'dev_luke' |
4,590 |
'dev_xlnet' |
7,932 |
'train' |
9,090 |
FeaturesDict({
'candidate': Text(shape=(), dtype=string),
'context': Text(shape=(), dtype=string),
'gold_index': int32,
'qid': Text(shape=(), dtype=string),
'question': Text(shape=(), dtype=string),
'question_1': ClassLabel(shape=(), dtype=int64, num_classes=3),
'question_2': ClassLabel(shape=(), dtype=int64, num_classes=3),
'question_3': ClassLabel(shape=(), dtype=int64, num_classes=3),
'question_4': ClassLabel(shape=(), dtype=int64, num_classes=3),
'reference': Text(shape=(), dtype=string),
'score': float32,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
candidate |
Text |
|
string |
|
context |
Text |
|
string |
|
gold_index |
Tensor |
|
int32 |
|
qid |
Text |
|
string |
|
question |
Text |
|
string |
|
question_1 |
ClassLabel |
|
int64 |
|
question_2 |
ClassLabel |
|
int64 |
|
question_3 |
ClassLabel |
|
int64 |
|
question_4 |
ClassLabel |
|
int64 |
|
reference |
Text |
|
string |
|
score |
Tensor |
|
float32 |
|
@article{bulian-etal-2022-tomayto,
title={Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation},
author={Jannis Bulian and Christian Buck and Wojciech Gajewski and Benjamin Boerschinger and Tal Schuster},
year={2022},
eprint={2202.07654},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# answer_equivalence\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe Answer Equivalence Dataset contains human ratings on model predictions from\nseveral models on the SQuAD dataset. The ratings establish whether the predicted\nanswer is 'equivalent' to the gold answer (taking into account both question and\ncontext).\n\nMore specifically, by 'equivalent' we mean that the predicted answer contains at\nleast the same information as the gold answer and does not add superfluous\ninformation. The dataset contains annotations for: \\* predictions from BiDAF on\nSQuAD dev \\* predictions from XLNet on SQuAD dev \\* predictions from Luke on SQuAD\ndev \\* predictions from Albert on SQuAD training, dev and test examples\n\n- **Homepage** :\n \u003chttps://github.com/google-research-datasets/answer-equivalence-dataset\u003e\n\n- **Source code** :\n [`tfds.datasets.answer_equivalence.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/answer_equivalence/answer_equivalence_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `45.86 MiB`\n\n- **Dataset size** : `47.24 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|---------------|----------|\n| `'ae_dev'` | 4,446 |\n| `'ae_test'` | 9,724 |\n| `'dev_bidaf'` | 7,522 |\n| `'dev_luke'` | 4,590 |\n| `'dev_xlnet'` | 7,932 |\n| `'train'` | 9,090 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'candidate': Text(shape=(), dtype=string),\n 'context': Text(shape=(), dtype=string),\n 'gold_index': int32,\n 'qid': Text(shape=(), dtype=string),\n 'question': Text(shape=(), dtype=string),\n 'question_1': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'question_2': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'question_3': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'question_4': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'reference': Text(shape=(), dtype=string),\n 'score': float32,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|-------|---------|-------------|\n| | FeaturesDict | | | |\n| candidate | Text | | string | |\n| context | Text | | string | |\n| gold_index | Tensor | | int32 | |\n| qid | Text | | string | |\n| question | Text | | string | |\n| question_1 | ClassLabel | | int64 | |\n| question_2 | ClassLabel | | int64 | |\n| question_3 | ClassLabel | | int64 | |\n| question_4 | ClassLabel | | int64 | |\n| reference | Text | | string | |\n| score | Tensor | | float32 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{bulian-etal-2022-tomayto,\n title={Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation},\n author={Jannis Bulian and Christian Buck and Wojciech Gajewski and Benjamin Boerschinger and Tal Schuster},\n year={2022},\n eprint={2202.07654},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n }"]]