Uplifting with Decision Forests

View on TensorFlow.org Run in Google Colab View on GitHub Download notebook

Welcome to the Uplifting Tutorial for TensorFlow Decision Forests (TF-DF). In this tutorial, you will learn what uplifting is, why it is so important, and how to do it in TF-DF.

This tutorial assumes you are familiar with the fundaments of TF-DF, in particular the installation procedure. The beginner tutorial is a great place to start learning about TF-DF.

In this colab, you will:

  • Learn what an uplift modeling is.
  • Train a Uplift Random Forest model on the Hillstrom Email Marketing dataset.
  • Evaluate the quality of this model.

Installing TensorFlow Decision Forests

Install TF-DF by running the following cell.

Wurlitzer is needed to display the detailed training logs in Colabs (when using verbose=2 in the model constructor).

pip install tensorflow_decision_forests wurlitzer

Importing libraries

import tensorflow_decision_forests as tfdf

import os
import numpy as np
import pandas as pd
import tensorflow as tf
import math
import matplotlib.pyplot as plt
2023-10-03 11:11:04.771348: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-03 11:11:04.771393: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-03 11:11:04.771442: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

The hidden code cell limits the output height in colab.

# Check the version of TensorFlow Decision Forests
print("Found TensorFlow Decision Forests v" + tfdf.__version__)
Found TensorFlow Decision Forests v1.6.0

What is uplift modeling?

Uplift modeling is a statistical modeling technique to predict the incremental impact of an action on a subject. The action is often referred to as a treatment that may or may not be applied.

Uplift modeling is often used in targeted marketing campaigns to predict the increase in the likelihood of a person making a purchase (or any other desired action) based on the marketing exposition they receive.

For example, uplift modeling can predict the effect of an email. The effect is defined as the conditional probability \begin{align} \text{effect}(\text{email}) = &\Pr(\text{outcome}=\text{purchase}\ \vert\ \text{treatment}=\text{with email})\ &- \Pr(\text{outcome}=\text{purchase} \ \vert\ \text{treatment}=\text{no email}), \end{align} where \(\Pr(\text{outcome}=\text{purchase}\ \vert\ ...)\) is the probability of purchase depending on the receiving or not an email.

Compare this to a classification model: With a classification model, one can predict the probability of a purchase. However, customers with a high probability are likely to spend money in the store regardless of whether or not they received an email.

Similarly, one can use numerical uplifting to predict the numerical increase in spend when receiving an email. In comparison, a regression model can only increase the expected spend, which is a less useful metric in many cases.

Defining uplift models in TF-DF

TF-DF expects uplifting datasets to be presented in a "flat" format. A dataset of customers might look like this

treatment outcome feature_1 feature_2
0 1 0.1 blue
0 0 0.2 blue
1 1 0.3 blue
1 1 0.4 blue

The treatment is a binary variable indicating whether or not the example has received treatment. In the above example, the treatment indicates if the customer has received an email or not. The outcome (label) indicates the status of the example after receiving the treatment (or not). TF-DF supports categorical outcomes for categorical uplifting and numerical outcomes for numerical uplifting.

Training an uplifting model

In this example, we will use the Hillstrom Email Marketing dataset.

This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test:

  • 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.
  • 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.
  • 1/3 were randomly chosen to not receive an e-mail campaign.

During a period of two weeks following the e-mail campaign, results were tracked. The task is to tell if the Mens or Womens e-mail campaign was successful.

Read more about dataset in its documentation. This tutorial uses the dataset as curated by TensorFlow Datasets.

# Install the TensorFlow Datasets package
pip install tensorflow-datasets -U --quiet
# Load the dataset
import tensorflow_datasets as tfds
raw_train, raw_test = tfds.load('hillstrom', split=['train[:80%]', 'train[20%:]'])

# Display the first 10 examples of the test fold.
pd.DataFrame(list(raw_test.batch(10).take(1))[0])
2023-10-03 11:11:10.733549: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-10-03 11:11:11.372447: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

Dataset preprocessing

Since TF-DF currently only supports binary treatments, combine the "Men's Email" and the "Women's Email" campaign. This tutorial uses the binary variable conversion as outcome. This means that the problem is a Categorical Uplifting problem. If we were using the numerical variable spend, the problem would be a Numerical Uplifting problem.

def prepare_dataset(example):
  # Use a binary treatment class.
  example['treatment'] = 1 if example['segment'] == b'Mens E-Mail' or example['segment'] == b'Womens E-Mail' else 0
  outcome = example['conversion']
  # Restrict the dataset to the input features.
  input_features = ['channel', 'history', 'mens', 'womens', 'newbie', 'recency', 'zip_code', 'treatment']
  example = {feature: example[feature] for feature in input_features}
  return example, outcome

train_ds = raw_train.map(prepare_dataset).batch(100)
test_ds = raw_test.map(prepare_dataset).batch(100)

Model training

Finally, train and evaluate the model as usual. Note that TF-DF only supports Random Forest models for uplifting.

%set_cell_height 300

# Configure the model and its hyper-parameters.
model = tfdf.keras.RandomForestModel(
    verbose=2,
    task=tfdf.keras.Task.CATEGORICAL_UPLIFT,
    uplift_treatment='treatment'
)

# Train the model.
model.fit(train_ds)
<IPython.core.display.Javascript object>
Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.
Use /tmpfs/tmp/tmpkvr89ot3 as temporary training directory
Reading training dataset...
Training tensor examples:
Features: {'channel': <tf.Tensor 'data:0' shape=(None,) dtype=string>, 'history': <tf.Tensor 'data_1:0' shape=(None,) dtype=float32>, 'mens': <tf.Tensor 'data_2:0' shape=(None,) dtype=int64>, 'womens': <tf.Tensor 'data_3:0' shape=(None,) dtype=int64>, 'newbie': <tf.Tensor 'data_4:0' shape=(None,) dtype=int64>, 'recency': <tf.Tensor 'data_5:0' shape=(None,) dtype=int64>, 'zip_code': <tf.Tensor 'data_6:0' shape=(None,) dtype=string>, 'treatment': <tf.Tensor 'data_7:0' shape=(None,) dtype=int32>}
Label: Tensor("data_8:0", shape=(None,), dtype=int64)
Weights: None
Normalized tensor features:
 {'channel': SemanticTensor(semantic=<Semantic.CATEGORICAL: 2>, tensor=<tf.Tensor 'data:0' shape=(None,) dtype=string>), 'history': SemanticTensor(semantic=<Semantic.NUMERICAL: 1>, tensor=<tf.Tensor 'data_1:0' shape=(None,) dtype=float32>), 'mens': SemanticTensor(semantic=<Semantic.NUMERICAL: 1>, tensor=<tf.Tensor 'Cast:0' shape=(None,) dtype=float32>), 'womens': SemanticTensor(semantic=<Semantic.NUMERICAL: 1>, tensor=<tf.Tensor 'Cast_1:0' shape=(None,) dtype=float32>), 'newbie': SemanticTensor(semantic=<Semantic.NUMERICAL: 1>, tensor=<tf.Tensor 'Cast_2:0' shape=(None,) dtype=float32>), 'recency': SemanticTensor(semantic=<Semantic.NUMERICAL: 1>, tensor=<tf.Tensor 'Cast_3:0' shape=(None,) dtype=float32>), 'zip_code': SemanticTensor(semantic=<Semantic.CATEGORICAL: 2>, tensor=<tf.Tensor 'data_6:0' shape=(None,) dtype=string>)}
Training dataset read in 0:00:04.719923. Found 51200 examples.
Training model...
Standard output detected as not visible to the user e.g. running in a notebook. Creating a training log redirection. If training gets stuck, try calling tfdf.keras.set_training_logs_redirection(False).
[INFO 23-10-03 11:11:16.2703 UTC kernel.cc:773] Start Yggdrasil model training
[INFO 23-10-03 11:11:16.2703 UTC kernel.cc:774] Collect training examples
[INFO 23-10-03 11:11:16.2703 UTC kernel.cc:787] Dataspec guide:
column_guides {
  column_name_pattern: "^__LABEL$"
  type: CATEGORICAL
}
default_column_guide {
  categorial {
    max_vocab_count: 2000
  }
  discretized_numerical {
    maximum_num_bins: 255
  }
}
ignore_columns_without_guides: false
detect_numerical_as_discretized_numerical: false

[INFO 23-10-03 11:11:16.2707 UTC kernel.cc:393] Number of batches: 512
[INFO 23-10-03 11:11:16.2707 UTC kernel.cc:394] Number of examples: 51200
[INFO 23-10-03 11:11:16.2800 UTC kernel.cc:794] Training dataset:
Number of records: 51200
Number of columns: 9

Number of columns by type:
    NUMERICAL: 5 (55.5556%)
    CATEGORICAL: 4 (44.4444%)

Columns:

NUMERICAL: 5 (55.5556%)
    2: "history" NUMERICAL mean:241.833 min:29.99 max:3345.93 sd:255.292
    3: "mens" NUMERICAL mean:0.550391 min:0 max:1 sd:0.497454
    4: "newbie" NUMERICAL mean:0.503086 min:0 max:1 sd:0.49999
    5: "recency" NUMERICAL mean:5.75514 min:1 max:12 sd:3.50281
    7: "womens" NUMERICAL mean:0.549687 min:0 max:1 sd:0.497525

CATEGORICAL: 4 (44.4444%)
    0: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item
    1: "channel" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:"Web" 22576 (44.0938%)
    6: "treatment" CATEGORICAL integerized vocab-size:3 no-ood-item
    8: "zip_code" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:"Surburban" 22966 (44.8555%)

Terminology:
    nas: Number of non-available (i.e. missing) values.
    ood: Out of dictionary.
    manually-defined: Attribute which type is manually defined by the user i.e. the type was not automatically inferred.
    tokenized: The attribute value is obtained through tokenization.
    has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
    vocab-size: Number of unique values.

[INFO 23-10-03 11:11:16.2800 UTC kernel.cc:810] Configure learner
[INFO 23-10-03 11:11:16.2802 UTC kernel.cc:824] Training config:
learner: "RANDOM_FOREST"
features: "^channel$"
features: "^history$"
features: "^mens$"
features: "^newbie$"
features: "^recency$"
features: "^womens$"
features: "^zip_code$"
label: "^__LABEL$"
task: CATEGORICAL_UPLIFT
random_seed: 123456
uplift_treatment: "treatment"
metadata {
  framework: "TF Keras"
}
pure_serving_model: false
[yggdrasil_decision_forests.model.random_forest.proto.random_forest_config] {
  num_trees: 300
  decision_tree {
    max_depth: 16
    min_examples: 5
    in_split_min_examples_check: true
    keep_non_leaf_label_distribution: true
    num_candidate_attributes: 0
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    axis_aligned_split {
    }
    internal {
      sorting_strategy: PRESORTED
    }
    uplift {
      min_examples_in_treatment: 5
      split_score: KULLBACK_LEIBLER
    }
  }
  winner_take_all_inference: true
  compute_oob_performances: true
  compute_oob_variable_importances: false
  num_oob_variable_importances_permutations: 1
  bootstrap_training_dataset: true
  bootstrap_size_ratio: 1
  adapt_bootstrap_size_ratio_for_maximum_training_duration: false
  sampling_with_replacement: true
}

[INFO 23-10-03 11:11:16.2806 UTC kernel.cc:827] Deployment config:
cache_path: "/tmpfs/tmp/tmpkvr89ot3/working_cache"
num_threads: 32
try_resume_training: true

[INFO 23-10-03 11:11:16.2808 UTC kernel.cc:889] Train model
[INFO 23-10-03 11:11:16.2809 UTC random_forest.cc:416] Training random forest on 51200 example(s) and 7 feature(s).
[WARNING 23-10-03 11:11:16.4040 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.4058 UTC random_forest.cc:802] Training of tree  1/300 (tree index:28) done qini:0.000608425 auuc:0.00206948
[WARNING 23-10-03 11:11:16.4811 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.4858 UTC random_forest.cc:802] Training of tree  11/300 (tree index:1) done qini:7.44252e-05 auuc:0.00242451
[WARNING 23-10-03 11:11:16.5640 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.5666 UTC random_forest.cc:802] Training of tree  21/300 (tree index:22) done qini:4.22719e-05 auuc:0.00240438
[WARNING 23-10-03 11:11:16.6477 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.6521 UTC random_forest.cc:802] Training of tree  31/300 (tree index:13) done qini:8.03027e-05 auuc:0.00245679
[WARNING 23-10-03 11:11:16.7137 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.7161 UTC random_forest.cc:802] Training of tree  41/300 (tree index:38) done qini:8.50687e-05 auuc:0.00246156
[WARNING 23-10-03 11:11:16.7806 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.7833 UTC random_forest.cc:802] Training of tree  51/300 (tree index:49) done qini:-3.59235e-05 auuc:0.00234057
[WARNING 23-10-03 11:11:16.8648 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.8692 UTC random_forest.cc:802] Training of tree  61/300 (tree index:59) done qini:-0.000105298 auuc:0.00227119
[WARNING 23-10-03 11:11:16.9304 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.9329 UTC random_forest.cc:802] Training of tree  71/300 (tree index:68) done qini:-0.000137303 auuc:0.00223919
[WARNING 23-10-03 11:11:16.9970 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:16.9996 UTC random_forest.cc:802] Training of tree  81/300 (tree index:80) done qini:-8.23665e-05 auuc:0.00229412
[WARNING 23-10-03 11:11:17.0654 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.0682 UTC random_forest.cc:802] Training of tree  91/300 (tree index:91) done qini:-0.000220825 auuc:0.00215566
[WARNING 23-10-03 11:11:17.1524 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.1570 UTC random_forest.cc:802] Training of tree  101/300 (tree index:95) done qini:-0.000228188 auuc:0.0021483
[WARNING 23-10-03 11:11:17.2209 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.2235 UTC random_forest.cc:802] Training of tree  111/300 (tree index:108) done qini:-0.000288918 auuc:0.00208757
[WARNING 23-10-03 11:11:17.2774 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.2798 UTC random_forest.cc:802] Training of tree  121/300 (tree index:117) done qini:-0.000304144 auuc:0.00207234
[WARNING 23-10-03 11:11:17.3440 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.3463 UTC random_forest.cc:802] Training of tree  131/300 (tree index:129) done qini:-0.000216986 auuc:0.0021595
[WARNING 23-10-03 11:11:17.4250 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.4296 UTC random_forest.cc:802] Training of tree  141/300 (tree index:140) done qini:-0.000173193 auuc:0.0022033
[WARNING 23-10-03 11:11:17.4940 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.4966 UTC random_forest.cc:802] Training of tree  151/300 (tree index:151) done qini:-0.000152671 auuc:0.00222382
[WARNING 23-10-03 11:11:17.5521 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.5560 UTC random_forest.cc:802] Training of tree  161/300 (tree index:158) done qini:-0.000176023 auuc:0.00220047
[WARNING 23-10-03 11:11:17.6199 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.6225 UTC random_forest.cc:802] Training of tree  171/300 (tree index:171) done qini:-0.000151236 auuc:0.00222525
[WARNING 23-10-03 11:11:17.6565 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.6589 UTC random_forest.cc:802] Training of tree  196/300 (tree index:195) done qini:-0.000153745 auuc:0.00222274
[WARNING 23-10-03 11:11:17.8094 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.8143 UTC random_forest.cc:802] Training of tree  206/300 (tree index:205) done qini:-0.000105493 auuc:0.002271
[WARNING 23-10-03 11:11:17.8704 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.8730 UTC random_forest.cc:802] Training of tree  216/300 (tree index:208) done qini:-0.00012975 auuc:0.00224674
[WARNING 23-10-03 11:11:17.9298 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:17.9323 UTC random_forest.cc:802] Training of tree  226/300 (tree index:223) done qini:-0.000134271 auuc:0.00224222
[WARNING 23-10-03 11:11:18.0143 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.0189 UTC random_forest.cc:802] Training of tree  236/300 (tree index:233) done qini:-0.00011439 auuc:0.0022621
[WARNING 23-10-03 11:11:18.0843 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.0870 UTC random_forest.cc:802] Training of tree  246/300 (tree index:246) done qini:-0.000150459 auuc:0.00222603
[WARNING 23-10-03 11:11:18.1504 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.1529 UTC random_forest.cc:802] Training of tree  256/300 (tree index:248) done qini:-0.00013702 auuc:0.00223947
[WARNING 23-10-03 11:11:18.1913 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.1941 UTC random_forest.cc:802] Training of tree  280/300 (tree index:279) done qini:-0.000126474 auuc:0.00225001
[WARNING 23-10-03 11:11:18.3165 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.3189 UTC random_forest.cc:802] Training of tree  290/300 (tree index:287) done qini:-0.000183679 auuc:0.00219281
[WARNING 23-10-03 11:11:18.3762 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation
[INFO 23-10-03 11:11:18.3785 UTC random_forest.cc:802] Training of tree  300/300 (tree index:295) done qini:-0.000173259 auuc:0.00220323
[INFO 23-10-03 11:11:18.3818 UTC random_forest.cc:882] Final OOB metrics: qini:-0.000173259 auuc:0.00220323
[INFO 23-10-03 11:11:18.3984 UTC kernel.cc:926] Export model in log directory: /tmpfs/tmp/tmpkvr89ot3 with prefix d0d80b64ba754300
[INFO 23-10-03 11:11:18.4402 UTC kernel.cc:944] Save model in resources
[INFO 23-10-03 11:11:18.4426 UTC abstract_model.cc:881] Model self evaluation:
Number of predictions (without weights): 51200
Number of predictions (with weights): 51200
Task: CATEGORICAL_UPLIFT
Label: __LABEL

Number of treatments: 2
AUUC: 0.00220323
Qini: -0.000173259

[INFO 23-10-03 11:11:18.4697 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmpkvr89ot3/model/ with prefix d0d80b64ba754300
[INFO 23-10-03 11:11:18.6711 UTC decision_forest.cc:660] Model loaded with 300 root(s), 60190 node(s), and 7 input feature(s).
[INFO 23-10-03 11:11:18.6711 UTC abstract_model.cc:1343] Engine "RandomForestGeneric" built
[INFO 23-10-03 11:11:18.6711 UTC kernel.cc:1061] Use fast generic engine
Model trained in 0:00:02.419511
Compiling model...
Model compiled.
<keras.src.callbacks.History at 0x7f48442b2a60>

Evaluating Uplift models.

Metrics for Uplift models

The two most important metrics for evaluating upift models are the AUUC (Area Under the Uplift Curve) metric and the Qini (Area Under the Qini Curve) metric. This is similar to the use of AUC and accuracy for classification problems. For both metrics, the larger they are, the better.

Both AUUC and Qini are not normalized metrics. This means that the best possible value of the metric can vary from dataset to dataset. This is different from, for example, the AUC matric that always varies between 0 and 1.

A formal definition of AUUC is below. For more information about these metrics, see Guelman and Betlei et al.

Model Self-Evaluation

TF-DF Random Forest models perform self-evaluation on the out-of-bag examples of the training dataset. For uplift models, they expose the AUUC and the Qini metric. You can directly retrieve the two metrics on the training dataset through the inspector

Later, we are going to recompute the AUUC metric "manually" on the test dataset. Note that two metrics are not expected to be exactly equal (out-of-bag on train vs test) since the AUUC is not a normalized metric.

# The self-evaluation is available through the model inspector
insp = model.make_inspector()
insp.evaluation()
Evaluation(num_examples=51200, accuracy=None, loss=None, rmse=None, ndcg=None, aucs=None, auuc=0.0022032303161204467, qini=-0.00017325876815314604)

Manually computing the AUUC

In this section, we manually compute the AUUC and plot the uplift curves.

The next few paragraphs explain the AUUC metric in more detail and may be skipped.

Computing the AUUC

Suppose you have a labeled dataset with \(|T|\) examples with treatment and \(|C|\) examples without treatment, called control examples. For each example, the uplift model \(f\) produces the conditional probability that a treatment on the example will yield a positive outcome.

Suppose a decision-maker needs to decide which clients to send an email using an uplift model \(f\). The model produces a (conditional) probability that the email will result in a conversion. The decision-maker might therefore just pick the number \(k\) of emails to send and send those \(k\) emails to the clients with the highest probability.

Using a labeled test dataset, it is possible to study the impact of \(k\) on the success of the campaign. First, we are interested in the ratio \(\frac{|C \cap T|}{|T|}\) of clients that received an email that converted versus total number of clients that received an email. Here \(C\) is the set of clients that received an email and converted and \(T\) is the total number of clients that received an email. We plot this ratio against \(k\).

Ideally, we like to have this curve increase steeply. This would mean that the model prioritizes sending email to those clients that will generate a conversion when receiving an email.

# Compute all predictions on the test dataset
predictions = model.predict(test_ds).flatten()
# Extract outcomes and treatments
outcomes = np.concatenate([outcome.numpy() for _, outcome in test_ds])
treatment = np.concatenate([example['treatment'].numpy() for example,_ in test_ds])
control = 1 - treatment

num_treatments = np.sum(treatment)
# Clients without treatment are called 'control' group
num_control = np.sum(control)
num_examples = len(predictions)

# Sort labels and treatments according to predictions in descending order
prediction_order = predictions.argsort()[::-1]
outcomes_sorted = outcomes[prediction_order]
treatment_sorted = treatment[prediction_order]
control_sorted = control[prediction_order]
ratio_treatment = np.cumsum(np.multiply(outcomes_sorted, treatment_sorted), axis=0)/num_treatments

fig, ax = plt.subplots()
ax.plot(ratio_treatment, label='Conversion ratio of treatment')
ax.set_xlabel('k')
ax.set_ylabel('Ratio of conversion')
ax.legend()
512/512 [==============================] - 3s 5ms/step
<matplotlib.legend.Legend at 0x7f482c2edd60>

png

Similarly, we can also compute and plot the conversion ratio of those not receiving an email, called the control group. Ideally, this curve is initially flat: This would mean that the model does not prioritize sending emails to clients that will generate a conversion despite not receiving a email

ratio_control = np.cumsum(np.multiply(outcomes_sorted, control_sorted), axis=0)/num_control
ax.plot(ratio_control, label='Conversion ratio of control')
ax.legend()
fig

png

The AUUC metric measures the area between these two curves, normalizing the y-axis between 0 and 1

x = np.linspace(0, 1, num_examples)
plt.plot(x,ratio_treatment, label='Conversion ratio of treatment')
plt.plot(x,ratio_control, label='Conversion ratio of control')
plt.fill_between(x, ratio_treatment, ratio_control, where=(ratio_treatment > ratio_control), color='C0', alpha=0.3)
plt.fill_between(x, ratio_treatment, ratio_control, where=(ratio_treatment < ratio_control), color='C1', alpha=0.3)
plt.xlabel('k')
plt.ylabel('Ratio of conversion')
plt.legend()

# Approximate the integral of the difference between the two curves.
auuc = np.trapz(ratio_treatment-ratio_control, dx=1/num_examples)
print(f'The AUUC on the test dataset is {auuc}')
The AUUC on the test dataset is 0.007513928513572819

png