- Description:
DocNLI is a large-scale dataset for document-level natural language inference (NLI). DocNLI is transformed from a broad range of NLP problems and covers multiple genres of text. The premises always stay in the document granularity, whereas the hypotheses vary in length from single sentences to passages with hundreds of words. In contrast to some existing sentence-level NLI datasets, DocNLI has pretty limited artifacts.
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/salesforce/DocNLI/
Source code:
tfds.text.docnli.DocNLIVersions:
1.0.0(default): Initial release.
Download size:
313.89 MiBDataset size:
3.07 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'test' |
267,086 |
'train' |
942,314 |
'validation' |
234,258 |
- Feature structure:
FeaturesDict({
'hypothesis': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'premise': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| hypothesis | Text | string | ||
| label | ClassLabel | int64 | ||
| premise | Text | string |
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{yin-etal-2021-docnli,
title={DocNLI: A Large-scale Dataset for Document-level Natural Language Inference},
author={Wenpeng Yin and Dragomir Radev and Caiming Xiong},
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
}