- Description:
The Yahoo Learning to Rank Challenge dataset (also called "C14") is a Learning-to-Rank dataset released by Yahoo. The dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.
The dataset contains two versions:
set1: Containing 709,877 query-document pairs.set2: Containing 172,870 query-document pairs.
You can specify whether to use the set1 or set2 version of the dataset as
follows:
ds = tfds.load("yahoo_ltrc/set1")
ds = tfds.load("yahoo_ltrc/set2")
If only yahoo_ltrc is specified, the yahoo_ltrc/set1 option is selected by
default:
# This is the same as `tfds.load("yahoo_ltrc/set1")`
ds = tfds.load("yahoo_ltrc")
Homepage: https://research.yahoo.com/datasets
Source code:
tfds.ranking.yahoo_ltrc.YahooLTRCVersions:
1.0.0: Initial release.1.1.0(default): Add query and document identifiers.
Download size:
Unknown sizeManual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir(defaults to~/tensorflow_datasets/downloads/manual/):
Request access for the C14 Yahoo Learning To Rank Challenge dataset on https://research.yahoo.com/datasets Extract the downloadeddataset.tgzfile and place theltrc_yahoo.tar.bz2file inmanual_dir/.Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Citation:
@inproceedings{chapelle2011yahoo,
title={Yahoo! learning to rank challenge overview},
author={Chapelle, Olivier and Chang, Yi},
booktitle={Proceedings of the learning to rank challenge},
pages={1--24},
year={2011},
organization={PMLR}
}
yahoo_ltrc/set1 (default config)
Dataset size:
795.39 MiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'test' |
6,983 |
'train' |
19,944 |
'vali' |
2,994 |
- Feature structure:
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 699), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| doc_id | Tensor | (None,) | int64 | |
| float_features | Tensor | (None, 699) | float64 | |
| label | Tensor | (None,) | float64 | |
| query_id | Text | string |
- Examples (tfds.as_dataframe):
yahoo_ltrc/set2
Dataset size:
194.92 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples |
|---|---|
'test' |
3,798 |
'train' |
1,266 |
'vali' |
1,266 |
- Feature structure:
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 700), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| doc_id | Tensor | (None,) | int64 | |
| float_features | Tensor | (None, 700) | float64 | |
| label | Tensor | (None,) | float64 | |
| query_id | Text | string |
- Examples (tfds.as_dataframe):