sci_tail
Stay organized with collections
Save and categorize content based on your preferences.
The SciTail dataset is an entailment dataset created from multiple-choice
science exams and web sentences. Each question and the correct answer choice are
converted into an assertive statement to form the hypothesis. Information
retrieval is used to obtain relevant text from a large text corpus of web
sentences, and these sentences are used as a premise P. The annotation of such
premise-hypothesis pair is crowdsourced as supports (entails) or not (neutral),
in order to create the SciTail dataset. The dataset contains 27,026 examples
with 10,101 examples with entails label and 16,925 examples with neutral label.
Split |
Examples |
'test' |
2,126 |
'train' |
23,097 |
'validation' |
1,304 |
FeaturesDict({
'hypothesis': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'premise': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
hypothesis |
Text |
|
string |
|
label |
ClassLabel |
|
int64 |
|
premise |
Text |
|
string |
|
@inproceedings{khot2018scitail,
title={Scitail: A textual entailment dataset from science question answering},
author={Khot, Tushar and Sabharwal, Ashish and Clark, Peter},
booktitle={Proceedings of the 32th AAAI Conference on Artificial Intelligence (AAAI 2018)},
url = "http://ai2-website.s3.amazonaws.com/publications/scitail-aaai-2018_cameraready.pdf",
year={2018}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-23 UTC.
[null,null,["Last updated 2022-12-23 UTC."],[],[],null,["# sci_tail\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe SciTail dataset is an entailment dataset created from multiple-choice\nscience exams and web sentences. Each question and the correct answer choice are\nconverted into an assertive statement to form the hypothesis. Information\nretrieval is used to obtain relevant text from a large text corpus of web\nsentences, and these sentences are used as a premise P. The annotation of such\npremise-hypothesis pair is crowdsourced as supports (entails) or not (neutral),\nin order to create the SciTail dataset. The dataset contains 27,026 examples\nwith 10,101 examples with entails label and 16,925 examples with neutral label.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/scitail)\n\n- **Homepage** :\n \u003chttps://allenai.org/data/scitail\u003e\n\n- **Source code** :\n [`tfds.datasets.sci_tail.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/sci_tail/sci_tail_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `13.52 MiB`\n\n- **Dataset size** : `6.01 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 2,126 |\n| `'train'` | 23,097 |\n| `'validation'` | 1,304 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'hypothesis': Text(shape=(), dtype=string),\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'premise': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| hypothesis | Text | | string | |\n| label | ClassLabel | | int64 | |\n| premise | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{khot2018scitail,\n title={Scitail: A textual entailment dataset from science question answering},\n author={Khot, Tushar and Sabharwal, Ashish and Clark, Peter},\n booktitle={Proceedings of the 32th AAAI Conference on Artificial Intelligence (AAAI 2018)},\n url = \"http://ai2-website.s3.amazonaws.com/publications/scitail-aaai-2018_cameraready.pdf\",\n year={2018}\n }"]]