Attend the Women in ML Symposium on December 7 Register now


  • Description:

With system performance on existing reading comprehension benchmarks nearing or surpassing human performance, we need a new, hard dataset that improves systems' capabilities to actually read paragraphs of text. DROP is a crowdsourced, adversarially-created, 96k-question benchmark, in which a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets.

Split Examples
'dev' 9,536
'train' 77,409
  • Feature structure:
    'answer': Text(shape=(), dtype=object),
    'passage': Text(shape=(), dtype=object),
    'query_id': Text(shape=(), dtype=object),
    'question': Text(shape=(), dtype=object),
    'validated_answers': Sequence(Text(shape=(), dtype=object)),
  • Feature documentation:
Feature Class Shape Dtype Description
answer Text object
passage Text object
query_id Text object
question Text object
validated_answers Sequence(Text) (None,) object
  • Citation:
  author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
  title={  {DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs},
  booktitle={Proc. of NAACL},