- Description:
ASQA is the first long-form question answering dataset that focuses on ambiguous
factoid questions. Different from previous long-form answers datasets, each
question is annotated with both long-form answers and extractive question-answer
pairs, which should be answerable by the generated passage. A generated
long-form answer will be evaluated using both ROUGE and QA accuracy. We showed
that these evaluation metrics correlated with human judgment well. In this
repostory we release the ASQA dataset, together with the evaluation code:
<a href="https://github.com/google-research/language/tree/master/language/asqa">https://github.com/google-research/language/tree/master/language/asqa</a>
Homepage: https://github.com/google-research/language/tree/master/language/asqa
Source code:
tfds.datasets.asqa.Builder
Versions:
1.0.0
(default): Initial release.2.0.0
: Sample ID goes from int32 (overflowing) to int64.
Download size:
17.86 MiB
Dataset size:
14.50 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'dev' |
948 |
'train' |
4,353 |
- Feature structure:
FeaturesDict({
'ambiguous_question': Text(shape=(), dtype=string),
'annotations': Sequence({
'knowledge': Sequence({
'content': Text(shape=(), dtype=string),
'wikipage': Text(shape=(), dtype=string),
}),
'long_answer': Text(shape=(), dtype=string),
}),
'qa_pairs': Sequence({
'context': Text(shape=(), dtype=string),
'question': Text(shape=(), dtype=string),
'short_answers': Sequence(Text(shape=(), dtype=string)),
'wikipage': Text(shape=(), dtype=string),
}),
'sample_id': int64,
'wikipages': Sequence({
'title': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
ambiguous_question | Text | string | Disambiguated question from AmbigQA. | |
annotations | Sequence | Long-form answers to the ambiguous question constructed by ASQA annotators. | ||
annotations/knowledge | Sequence | List of additional knowledge pieces. | ||
annotations/knowledge/content | Text | string | A passage from Wikipedia. | |
annotations/knowledge/wikipage | Text | string | Title of the Wikipedia page the passage was taken from. | |
annotations/long_answer | Text | string | Annotation. | |
qa_pairs | Sequence | Q&A pairs from AmbigQA which are used for disambiguation. | ||
qa_pairs/context | Text | string | Additional context provided. | |
qa_pairs/question | Text | string | ||
qa_pairs/short_answers | Sequence(Text) | (None,) | string | List of short answers from AmbigQA. |
qa_pairs/wikipage | Text | string | Title of the Wikipedia page the additional context was taken from. | |
sample_id | Tensor | int64 | ||
wikipages | Sequence | List of Wikipedia pages visited by AmbigQA annotators. | ||
wikipages/title | Text | string | Title of the Wikipedia page. | |
wikipages/url | Text | string | Link to the Wikipedia page. |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@misc{https://doi.org/10.48550/arxiv.2204.06092,
doi = {10.48550/ARXIV.2204.06092},
url = {https://arxiv.org/abs/2204.06092},
author = {Stelmakh, Ivan and Luan, Yi and Dhingra, Bhuwan and Chang, Ming-Wei},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {ASQA: Factoid Questions Meet Long-Form Answers},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}