Attend the Women in ML Symposium on December 7 Register now



Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:trec')
  • Description:
The Text REtrieval Conference (TREC) Question Classification dataset contains 5500 labeled questions in training set and another 500 for test set. The dataset has 6 labels, 47 level-2 labels. Average length of each sentence is 10, vocabulary size of 8700.

Data are collected from four sources: 4,500 English questions published by USC (Hovy et al., 2001), about 500 manually constructed questions for a few rare classes, 894 TREC 8 and TREC 9 questions, and also 500 questions from TREC 10 which serves as the test set.
  • License: No known license
  • Version: 1.1.0
  • Splits:
Split Examples
'test' 500
'train' 5452
  • Features:
    "label-coarse": {
        "num_classes": 6,
        "names": [
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    "label-fine": {
        "num_classes": 47,
        "names": [
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"