TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

doc_nli

Description:

DocNLI is a large-scale dataset for document-level natural language inference (NLI). DocNLI is transformed from a broad range of NLP problems and covers multiple genres of text. The premises always stay in the document granularity, whereas the hypotheses vary in length from single sentences to passages with hundreds of words. In contrast to some existing sentence-level NLI datasets, DocNLI has pretty limited artifacts.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/salesforce/DocNLI/
Source code: tfds.text.docnli.DocNLI
Versions:
- 1.0.0 (default): Initial release.
Download size: 313.89 MiB
Dataset size: 3.07 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	267,086
`'train'`	942,314
`'validation'`	234,258

Feature structure:

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'premise': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
hypothesis	Text	string
label	ClassLabel	int64
premise	Text	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@inproceedings{yin-etal-2021-docnli,
    title={DocNLI: A Large-scale Dataset for Document-level Natural Language Inference},
    author={Wenpeng Yin and Dragomir Radev and Caiming Xiong},
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
}