criteo

  • Description:

Criteo Uplift Modeling Dataset

This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)

This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

Data description

This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).

Fields

Here is a detailed description of the fields (they are comma-separated in the file):

  • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
  • treatment: treatment group (1 = treated, 0 = control)
  • conversion: whether a conversion occured for this user (binary, label)
  • visit: whether a visit occured for this user (binary, label)
  • exposure: treatment effect, whether the user has been effectively exposed (binary)

Key figures

  • Format: CSV
  • Size: 459MB (compressed)
  • Rows: 25,309,483
  • Average Visit Rate: .04132
  • Average Conversion Rate: .00229
  • Treatment Ratio: .846

Tasks

The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:

Split Examples
'train' 13,979,592
  • Feature structure:
FeaturesDict({
    'conversion': tf.bool,
    'exposure': tf.bool,
    'f0': tf.float32,
    'f1': tf.float32,
    'f10': tf.float32,
    'f11': tf.float32,
    'f2': tf.float32,
    'f3': tf.float32,
    'f4': tf.float32,
    'f5': tf.float32,
    'f6': tf.float32,
    'f7': tf.float32,
    'f8': tf.float32,
    'f9': tf.float32,
    'treatment': tf.int64,
    'visit': tf.bool,
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
conversion Tensor tf.bool
exposure Tensor tf.bool
f0 Tensor tf.float32
f1 Tensor tf.float32
f10 Tensor tf.float32
f11 Tensor tf.float32
f2 Tensor tf.float32
f3 Tensor tf.float32
f4 Tensor tf.float32
f5 Tensor tf.float32
f6 Tensor tf.float32
f7 Tensor tf.float32
f8 Tensor tf.float32
f9 Tensor tf.float32
treatment Tensor tf.int64
visit Tensor tf.bool
  • Supervised keys (See as_supervised doc): ({'exposure': 'exposure', 'f0': 'f0', 'f1': 'f1', 'f10': 'f10', 'f11': 'f11', 'f2': 'f2', 'f3': 'f3', 'f4': 'f4', 'f5': 'f5', 'f6': 'f6', 'f7': 'f7', 'f8': 'f8', 'f9': 'f9', 'treatment': 'treatment'}, 'visit')

  • Figure (tfds.show_examples): Not supported.

  • Examples (tfds.as_dataframe):

  • Citation:
@inproceedings{Diemert2018,
author = { {Diemert Eustache, Betlei Artem} and Renaudin, Christophe and Massih-Reza, Amini},
title={A Large Scale Benchmark for Uplift Modeling},
publisher = {ACM},
booktitle = {Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018},
year = {2018}
}