trec
Stay organized with collections
Save and categorize content based on your preferences.
The Text REtrieval Conference (TREC) Question Classification dataset contains
5500 labeled questions in training set and another 500 for test set. The dataset
has 6 labels, 47 level-2 labels. Average length of each sentence is 10,
vocabulary size of 8700. Data are collected from four sources: 4,500 English
questions published by USC (Hovy et al., 2001), about 500 manually constructed
questions for a few rare classes, 894 TREC 8 and TREC 9 questions, and also 500
questions from TREC 10 which serves as the test set.
Split |
Examples |
'test' |
500 |
'train' |
5,452 |
FeaturesDict({
'label-coarse': ClassLabel(shape=(), dtype=int64, num_classes=6),
'label-fine': ClassLabel(shape=(), dtype=int64, num_classes=47),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
label-coarse |
ClassLabel |
|
int64 |
|
label-fine |
ClassLabel |
|
int64 |
|
text |
Text |
|
string |
|
@inproceedings{li-roth-2002-learning,
title = "Learning Question Classifiers",
author = "Li, Xin and
Roth, Dan",
booktitle = "{COLING} 2002: The 19th International Conference on Computational Linguistics",
year = "2002",
url = "https://www.aclweb.org/anthology/C02-1150",
}
@inproceedings{hovy-etal-2001-toward,
title = "Toward Semantics-Based Answer Pinpointing",
author = "Hovy, Eduard and
Gerber, Laurie and
Hermjakob, Ulf and
Lin, Chin-Yew and
Ravichandran, Deepak",
booktitle = "Proceedings of the First International Conference on Human Language Technology Research",
year = "2001",
url = "https://www.aclweb.org/anthology/H01-1069",
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-02-12 UTC.
[null,null,["Last updated 2023-02-12 UTC."],[],[],null,["# trec\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe Text REtrieval Conference (TREC) Question Classification dataset contains\n5500 labeled questions in training set and another 500 for test set. The dataset\nhas 6 labels, 47 level-2 labels. Average length of each sentence is 10,\nvocabulary size of 8700. Data are collected from four sources: 4,500 English\nquestions published by USC (Hovy et al., 2001), about 500 manually constructed\nquestions for a few rare classes, 894 TREC 8 and TREC 9 questions, and also 500\nquestions from TREC 10 which serves as the test set.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/trec-10)\n\n- **Homepage** :\n \u003chttps://cogcomp.seas.upenn.edu/Data/QA/QC/\u003e\n\n- **Source code** :\n [`tfds.datasets.trec.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/trec/trec_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): No release notes.\n- **Download size** : `350.79 KiB`\n\n- **Dataset size** : `636.90 KiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 500 |\n| `'train'` | 5,452 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'label-coarse': ClassLabel(shape=(), dtype=int64, num_classes=6),\n 'label-fine': ClassLabel(shape=(), dtype=int64, num_classes=47),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| label-coarse | ClassLabel | | int64 | |\n| label-fine | ClassLabel | | int64 | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{li-roth-2002-learning,\n title = \"Learning Question Classifiers\",\n author = \"Li, Xin and\n Roth, Dan\",\n booktitle = \"{COLING} 2002: The 19th International Conference on Computational Linguistics\",\n year = \"2002\",\n url = \"https://www.aclweb.org/anthology/C02-1150\",\n }\n @inproceedings{hovy-etal-2001-toward,\n title = \"Toward Semantics-Based Answer Pinpointing\",\n author = \"Hovy, Eduard and\n Gerber, Laurie and\n Hermjakob, Ulf and\n Lin, Chin-Yew and\n Ravichandran, Deepak\",\n booktitle = \"Proceedings of the First International Conference on Human Language Technology Research\",\n year = \"2001\",\n url = \"https://www.aclweb.org/anthology/H01-1069\",\n }"]]