yahoo_ltrc
Stay organized with collections
Save and categorize content based on your preferences.
Warning: Manual download required. See instructions below.
The Yahoo Learning to Rank Challenge dataset (also called "C14") is a
Learning-to-Rank dataset released by Yahoo. The dataset consists of
query-document pairs represented as feature vectors and corresponding relevance
judgment labels.
The dataset contains two versions:
set1
: Containing 709,877 query-document pairs.
set2
: Containing 172,870 query-document pairs.
You can specify whether to use the set1
or set2
version of the dataset as
follows:
ds = tfds . load ( "yahoo_ltrc/set1" )
ds = tfds . load ( "yahoo_ltrc/set2" )
If only yahoo_ltrc
is specified, the yahoo_ltrc/set1
option is selected by
default:
# This is the same as `tfds.load("yahoo_ltrc/set1")`
ds = tfds . load ( "yahoo_ltrc" )
@inproceedings { chapelle2011yahoo ,
title = { Yahoo ! learning to rank challenge overview } ,
author = { Chapelle , Olivier and Chang , Yi } ,
booktitle = { Proceedings of the learning to rank challenge } ,
pages = { 1 --24},
year = { 2011 } ,
organization = { PMLR }
}
yahoo_ltrc/set1 (default config)
Split
Examples
'test'
6,983
'train'
19,944
'vali'
2,994
FeaturesDict ({
'doc_id' : Tensor ( shape = ( None ,), dtype = int64 ),
'float_features' : Tensor ( shape = ( None , 699 ), dtype = float64 ),
'label' : Tensor ( shape = ( None ,), dtype = float64 ),
'query_id' : Text ( shape = (), dtype = string ),
})
Feature
Class
Shape
Dtype
Description
FeaturesDict
doc_id
Tensor
(None,)
int64
float_features
Tensor
(None, 699)
float64
label
Tensor
(None,)
float64
query_id
Text
string
yahoo_ltrc/set2
Split
Examples
'test'
3,798
'train'
1,266
'vali'
1,266
FeaturesDict ({
'doc_id' : Tensor ( shape = ( None ,), dtype = int64 ),
'float_features' : Tensor ( shape = ( None , 700 ), dtype = float64 ),
'label' : Tensor ( shape = ( None ,), dtype = float64 ),
'query_id' : Text ( shape = (), dtype = string ),
})
Feature
Class
Shape
Dtype
Description
FeaturesDict
doc_id
Tensor
(None,)
int64
float_features
Tensor
(None, 700)
float64
label
Tensor
(None,)
float64
query_id
Text
string
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-01-19 UTC.
[null,null,["Last updated 2023-01-19 UTC."],[],[],null,["# yahoo_ltrc\n\n\u003cbr /\u003e\n\n| **Warning:** Manual download required. See instructions below.\n\n- **Description**:\n\nThe Yahoo Learning to Rank Challenge dataset (also called \"C14\") is a\nLearning-to-Rank dataset released by Yahoo. The dataset consists of\nquery-document pairs represented as feature vectors and corresponding relevance\njudgment labels.\n\nThe dataset contains two versions:\n\n- `set1`: Containing 709,877 query-document pairs.\n- `set2`: Containing 172,870 query-document pairs.\n\nYou can specify whether to use the `set1` or `set2` version of the dataset as\nfollows: \n\n ds = tfds.load(\"yahoo_ltrc/set1\")\n ds = tfds.load(\"yahoo_ltrc/set2\")\n\nIf only `yahoo_ltrc` is specified, the `yahoo_ltrc/set1` option is selected by\ndefault: \n\n # This is the same as `tfds.load(\"yahoo_ltrc/set1\")`\n ds = tfds.load(\"yahoo_ltrc\")\n\n- **Homepage** :\n \u003chttps://research.yahoo.com/datasets\u003e\n\n- **Source code** :\n [`tfds.ranking.yahoo_ltrc.YahooLTRC`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/ranking/yahoo_ltrc/yahoo_ltrc.py)\n\n- **Versions**:\n\n - `1.0.0`: Initial release.\n - **`1.1.0`** (default): Add query and document identifiers.\n- **Download size** : `Unknown size`\n\n- **Manual download instructions** : This dataset requires you to\n download the source data manually into `download_config.manual_dir`\n (defaults to `~/tensorflow_datasets/downloads/manual/`): \n\n Request access for the C14 Yahoo Learning To Rank Challenge dataset on\n \u003chttps://research.yahoo.com/datasets\u003e Extract the downloaded `dataset.tgz` file\n and place the `ltrc_yahoo.tar.bz2` file in `manual_dir/`.\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @inproceedings{chapelle2011yahoo,\n title={Yahoo! learning to rank challenge overview},\n author={Chapelle, Olivier and Chang, Yi},\n booktitle={Proceedings of the learning to rank challenge},\n pages={1--24},\n year={2011},\n organization={PMLR}\n }\n\nyahoo_ltrc/set1 (default config)\n--------------------------------\n\n- **Dataset size** : `795.39 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 6,983 |\n| `'train'` | 19,944 |\n| `'vali'` | 2,994 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'doc_id': Tensor(shape=(None,), dtype=int64),\n 'float_features': Tensor(shape=(None, 699), dtype=float64),\n 'label': Tensor(shape=(None,), dtype=float64),\n 'query_id': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------|--------------|-------------|---------|-------------|\n| | FeaturesDict | | | |\n| doc_id | Tensor | (None,) | int64 | |\n| float_features | Tensor | (None, 699) | float64 | |\n| label | Tensor | (None,) | float64 | |\n| query_id | Text | | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nyahoo_ltrc/set2\n---------------\n\n- **Dataset size** : `194.92 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 3,798 |\n| `'train'` | 1,266 |\n| `'vali'` | 1,266 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'doc_id': Tensor(shape=(None,), dtype=int64),\n 'float_features': Tensor(shape=(None, 700), dtype=float64),\n 'label': Tensor(shape=(None,), dtype=float64),\n 'query_id': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------|--------------|-------------|---------|-------------|\n| | FeaturesDict | | | |\n| doc_id | Tensor | (None,) | int64 | |\n| float_features | Tensor | (None, 700) | float64 | |\n| label | Tensor | (None,) | float64 | |\n| query_id | Text | | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]