q_re_cc
Stay organized with collections
Save and categorize content based on your preferences.
A dataset containing 14K conversations with 81K question-answer pairs. QReCC is
built on questions from TREC CAsT, QuAC and Google Natural Questions.
Split |
Examples |
'test' |
16,451 |
'train' |
63,501 |
FeaturesDict({
'answer': Text(shape=(), dtype=string),
'answer_url': Text(shape=(), dtype=string),
'context': Sequence(Text(shape=(), dtype=string)),
'conversation_id': Scalar(shape=(), dtype=int32, description=The id of the conversation.),
'question': Text(shape=(), dtype=string),
'question_rewrite': Text(shape=(), dtype=string),
'source': Text(shape=(), dtype=string),
'turn_id': Scalar(shape=(), dtype=int32, description=The id of the conversation turn, within a conversation.),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
answer |
Text |
|
string |
|
answer_url |
Text |
|
string |
|
context |
Sequence(Text) |
(None,) |
string |
|
conversation_id
|
Scalar
|
|
int32
|
The id of the
conversation. |
question |
Text |
|
string |
|
question_rewrite |
Text |
|
string |
|
source
|
Text
|
|
string
|
The original
source of the
data --
either QuAC,
CAsT or
Natural
Questions |
turn_id
|
Scalar
|
|
int32
|
The id of the
conversation
turn, within
a
conversation. |
@article{qrecc,
title={Open-Domain Question Answering Goes Conversational via Question Rewriting},
author={Anantha, Raviteja and Vakulenko, Svitlana and Tu, Zhucheng and Longpre, Shayne and Pulman, Stephen and Chappidi, Srinivas},
journal={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
year={2021}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-03 UTC.
[null,null,["Last updated 2024-09-03 UTC."],[],[],null,["# q_re_cc\n\n\u003cbr /\u003e\n\n- **Description**:\n\nA dataset containing 14K conversations with 81K question-answer pairs. QReCC is\nbuilt on questions from TREC CAsT, QuAC and Google Natural Questions.\n\n- **Homepage** :\n \u003chttps://github.com/apple/ml-qrecc\u003e\n\n- **Source code** :\n [`tfds.text.qrecc.QReCC`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/qrecc/qrecc.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `7.60 MiB`\n\n- **Dataset size** : `69.29 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 16,451 |\n| `'train'` | 63,501 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'answer': Text(shape=(), dtype=string),\n 'answer_url': Text(shape=(), dtype=string),\n 'context': Sequence(Text(shape=(), dtype=string)),\n 'conversation_id': Scalar(shape=(), dtype=int32, description=The id of the conversation.),\n 'question': Text(shape=(), dtype=string),\n 'question_rewrite': Text(shape=(), dtype=string),\n 'source': Text(shape=(), dtype=string),\n 'turn_id': Scalar(shape=(), dtype=int32, description=The id of the conversation turn, within a conversation.),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------------|----------------|---------|--------|---------------------------------------------------------------------------|\n| | FeaturesDict | | | |\n| answer | Text | | string | |\n| answer_url | Text | | string | |\n| context | Sequence(Text) | (None,) | string | |\n| conversation_id | Scalar | | int32 | The id of the conversation. |\n| question | Text | | string | |\n| question_rewrite | Text | | string | |\n| source | Text | | string | The original source of the data -- either QuAC, CAsT or Natural Questions |\n| turn_id | Scalar | | int32 | The id of the conversation turn, within a conversation. |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{qrecc,\n title={Open-Domain Question Answering Goes Conversational via Question Rewriting},\n author={Anantha, Raviteja and Vakulenko, Svitlana and Tu, Zhucheng and Longpre, Shayne and Pulman, Stephen and Chappidi, Srinivas},\n journal={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},\n year={2021}\n }"]]