TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

mslr_web

Description:

MSLR-WEB are two large-scale Learning-to-Rank datasets released by Microsoft Research. The first dataset (called "30k") contains 30,000 queries and the second dataset (called "10k") contains 10,000 queries. Each dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.

You can specify whether to use the "10k" or "30k" version of the dataset, and a corresponding fold, as follows:

ds = tfds.load("mslr_web/30k_fold1")

If only mslr_web is specified, the mslr_web/10k_fold1 option is selected by default:

# This is the same as `tfds.load("mslr_web/10k_fold1")`
ds = tfds.load("mslr_web")

Homepage: https://www.microsoft.com/en-us/research/project/mslr/
Source code: tfds.ranking.mslr_web.MslrWeb
Versions:
- 1.0.0: Initial release.
- 1.1.0: Bundle features into a single 'float_features' feature.
- 1.2.0 (default): Add query and document identifiers.
Auto-cached (documentation): No
Feature structure:

FeaturesDict({
    'doc_id': Tensor(shape=(None,), dtype=int64),
    'float_features': Tensor(shape=(None, 136), dtype=float64),
    'label': Tensor(shape=(None,), dtype=float64),
    'query_id': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
doc_id	Tensor	(None,)	int64
float_features	Tensor	(None, 136)	float64
label	Tensor	(None,)	float64
query_id	Text		string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@article{DBLP:journals/corr/QinL13,
  author    = {Tao Qin and Tie{-}Yan Liu},
  title     = {Introducing {LETOR} 4.0 Datasets},
  journal   = {CoRR},
  volume    = {abs/1306.2597},
  year      = {2013},
  url       = {http://arxiv.org/abs/1306.2597},
  timestamp = {Mon, 01 Jul 2013 20:31:25 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/QinL13},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}

mslr_web/10k_fold1 (default config)

Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:

Split	Examples
`'test'`	2,000
`'train'`	6,000
`'vali'`	2,000

Examples (tfds.as_dataframe):

mslr_web/10k_fold2

Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:

Split	Examples
`'test'`	2,000
`'train'`	6,000
`'vali'`	2,000

Examples (tfds.as_dataframe):

mslr_web/10k_fold3

Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:

Split	Examples
`'test'`	2,000
`'train'`	6,000
`'vali'`	2,000

Examples (tfds.as_dataframe):

mslr_web/10k_fold4

Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:

Split	Examples
`'test'`	2,000
`'train'`	6,000
`'vali'`	2,000

Examples (tfds.as_dataframe):

mslr_web/10k_fold5

Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:

Split	Examples
`'test'`	2,000
`'train'`	6,000
`'vali'`	2,000

Examples (tfds.as_dataframe):

mslr_web/30k_fold1

Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:

Split	Examples
`'test'`	6,306
`'train'`	18,919
`'vali'`	6,306

Examples (tfds.as_dataframe):

mslr_web/30k_fold2

Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:

Split	Examples
`'test'`	6,307
`'train'`	18,918
`'vali'`	6,306

Examples (tfds.as_dataframe):

mslr_web/30k_fold3

Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:

Split	Examples
`'test'`	6,306
`'train'`	18,918
`'vali'`	6,307

Examples (tfds.as_dataframe):

mslr_web/30k_fold4

Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:

Split	Examples
`'test'`	6,306
`'train'`	18,919
`'vali'`	6,306

Examples (tfds.as_dataframe):

mslr_web/30k_fold5

Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:

Split	Examples
`'test'`	6,306
`'train'`	18,919
`'vali'`	6,306

Examples (tfds.as_dataframe):