TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

snli

Description:

The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE).

Additional Documentation: Explore on Papers With Code
Homepage: https://nlp.stanford.edu/projects/snli/
Source code: tfds.datasets.snli.Builder
Versions:
- 1.1.0 (default): No release notes.
Download size: 90.17 MiB
Dataset size: 87.00 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	10,000
`'train'`	550,152
`'validation'`	10,000

Feature structure:

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'premise': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
hypothesis	Text	string
label	ClassLabel	int64
premise	Text	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@inproceedings{snli:emnlp2015,
    Author = {Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher, and Manning, Christopher D.},
    Booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    Publisher = {Association for Computational Linguistics},
    Title = {A large annotated corpus for learning natural language inference},
    Year = {2015}
}