yahoo_ltrc

  • Description:

The Yahoo Learning to Rank Challenge dataset (also called "C14") is a Learning-to-Rank dataset released by Yahoo. The dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.

The dataset contains two versions:

  • set1: Containing 709,877 query-document pairs.
  • set2: Containing 172,870 query-document pairs.

You can specify whether to use the set1 or set2 version of the dataset as follows:

ds = tfds.load("yahoo_ltrc/set1")
ds = tfds.load("yahoo_ltrc/set2")

If only yahoo_ltrc is specified, the yahoo_ltrc/set1 option is selected by default:

# This is the same as `tfds.load("yahoo_ltrc/set1")`
ds = tfds.load("yahoo_ltrc")
  • Homepage: https://research.yahoo.com/datasets

  • Source code: tfds.ranking.yahoo_ltrc.YahooLTRC

  • Versions:

    • 1.0.0: Initial release.
    • 1.1.0 (default): Add query and document identifiers.
  • Download size: Unknown size

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
    Request access for the C14 Yahoo Learning To Rank Challenge dataset on https://research.yahoo.com/datasets Extract the downloaded dataset.tgz file and place the ltrc_yahoo.tar.bz2 file in manual_dir/.

  • Supervised keys (See as_supervised doc): None

  • Figure (tfds.show_examples): Not supported.

  • Citation:

@inproceedings{chapelle2011yahoo,
  title={Yahoo! learning to rank challenge overview},
  author={Chapelle, Olivier and Chang, Yi},
  booktitle={Proceedings of the learning to rank challenge},
  pages={1--24},
  year={2011},
  organization={PMLR}
}

yahoo_ltrc/set1 (default config)

  • Dataset size: 795.39 MiB

  • Auto-cached (documentation): No

  • Splits:

Split Examples
'test' 6,983
'train' 19,944
'vali' 2,994
  • Feature structure:
FeaturesDict({
    'doc_id': Tensor(shape=(None,), dtype=int64),
    'float_features': Tensor(shape=(None, 699), dtype=float64),
    'label': Tensor(shape=(None,), dtype=float64),
    'query_id': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
doc_id Tensor (None,) int64
float_features Tensor (None, 699) float64
label Tensor (None,) float64
query_id Text string

yahoo_ltrc/set2

  • Dataset size: 194.92 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 3,798
'train' 1,266
'vali' 1,266
  • Feature structure:
FeaturesDict({
    'doc_id': Tensor(shape=(None,), dtype=int64),
    'float_features': Tensor(shape=(None, 700), dtype=float64),
    'label': Tensor(shape=(None,), dtype=float64),
    'query_id': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
doc_id Tensor (None,) int64
float_features Tensor (None, 700) float64
label Tensor (None,) float64
query_id Text string