q_re_cc

  • Description:

A dataset containing 14K conversations with 81K question-answer pairs. QReCC is built on questions from TREC CAsT, QuAC and Google Natural Questions.

Split Examples
'test' 16,451
'train' 63,501
  • Feature structure:
FeaturesDict({
    'answer': Text(shape=(), dtype=string),
    'answer_url': Text(shape=(), dtype=string),
    'context': Sequence(Text(shape=(), dtype=string)),
    'conversation_id': Scalar(shape=(), dtype=int32),
    'question': Text(shape=(), dtype=string),
    'question_rewrite': Text(shape=(), dtype=string),
    'source': Text(shape=(), dtype=string),
    'turn_id': Scalar(shape=(), dtype=int32),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
answer Text string
answer_url Text string
context Sequence(Text) (None,) string
conversation_id Scalar int32 The id of the conversation.
question Text string
question_rewrite Text string
source Text string The original source of the data -- either QuAC, CAsT or Natural Questions
turn_id Scalar int32 The id of the conversation turn, within a conversation.
  • Citation:
@article{qrecc,
  title={Open-Domain Question Answering Goes Conversational via Question Rewriting},
  author={Anantha, Raviteja and Vakulenko, Svitlana and Tu, Zhucheng and Longpre, Shayne and Pulman, Stephen and Chappidi, Srinivas},
  journal={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  year={2021}
}