Introduction to Fairness Indicators

View on TensorFlow.org Run in Google Colab View on GitHub Download notebook See TF Hub model

Overview

Fairness Indicators is a suite of tools built on top of TensorFlow Model Analysis (TFMA) that enable regular evaluation of fairness metrics in product pipelines. TFMA is a library for evaluating both TensorFlow and non-TensorFlow machine learning models. It allows you to evaluate your models on large amounts of data in a distributed manner, compute in-graph and other metrics over different slices of data, and visualize them in notebooks.

Fairness Indicators is packaged with TensorFlow Data Validation (TFDV) and the What-If Tool. Using Fairness Indicators allows you to:

  • Evaluate model performance, sliced across defined groups of users
  • Gain confidence about results with confidence intervals and evaluations at multiple thresholds
  • Evaluate the distribution of datasets
  • Dive deep into individual slices to explore root causes and opportunities for improvement

In this notebook, you will use Fairness Indicators to fix fairness issues in a model you train using the Civil Comments dataset. Watch this video for more details and context on the real-world scenario this is based on which is also one of primary motivations for creating Fairness Indicators.

Dataset

In this notebook, you will work with the Civil Comments dataset, approximately 2 million public comments made public by the Civil Comments platform in 2017 for ongoing research. This effort was sponsored by Jigsaw, who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.

Each individual text comment in the dataset has a toxicity label, with the label being 1 if the comment is toxic and 0 if the comment is non-toxic. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity.

Setup

Install fairness-indicators and witwidget.

pip install -q -U pip==20.2

pip install -q fairness-indicators
pip install -q witwidget

You must restart the Colab runtime after installing. Select Runtime > Restart runtime from the Colab menu.

Do not proceed with the rest of this tutorial without first restarting the runtime.

Import all other required libraries.

import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime
import pprint

from google.protobuf import text_format

import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv

from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record

from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view

from fairness_indicators.tutorial_utils import util

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

from tensorflow_metadata.proto.v0 import schema_pb2

Download and analyze the data

By default, this notebook downloads a preprocessed version of this dataset, but you may use the original dataset and re-run the processing steps if desired. In the original dataset, each comment is labeled with the percentage of raters who believed that a comment corresponds to a particular identity. For example, a comment might be labeled with the following: { male: 0.3, female: 1.0, transgender: 0.0, heterosexual: 0.8, homosexual_gay_or_lesbian: 1.0 } The processing step groups identity by category (gender, sexual_orientation, etc.) and removes identities with a score less than 0.5. So the example above would be converted to the following: of raters who believed that a comment corresponds to a particular identity. For example, the comment would be labeled with the following: { gender: [female], sexual_orientation: [heterosexual, homosexual_gay_or_lesbian] }

download_original_data = False

if download_original_data:
  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')

  # The identity terms list will be grouped together by their categories
  # (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,
  # text column and label column will be kept after processing.
  train_tf_file = util.convert_comments_data(train_tf_file)
  validate_tf_file = util.convert_comments_data(validate_tf_file)

else:
  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')
Downloading data from https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord
488161280/488153424 [==============================] - 11s 0us/step
488169472/488153424 [==============================] - 11s 0us/step
Downloading data from https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord
324943872/324941336 [==============================] - 9s 0us/step
324952064/324941336 [==============================] - 9s 0us/step

Use TFDV to analyze the data and find potential problems in it, such as missing values and data imbalances, that can lead to fairness disparities.

stats = tfdv.generate_statistics_from_tfrecord(data_location=train_tf_file)
tfdv.visualize_statistics(stats)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/statistics_io_impl.py:100: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/statistics_io_impl.py:100: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

TFDV shows that there are some significant imbalances in the data which could lead to biased model outcomes.

  • The toxicity label (the value predicted by the model) is unbalanced. Only 8% of the examples in the training set are toxic, which means that a classifier could get 92% accuracy by predicting that all comments are non-toxic.

  • In the fields relating to identity terms, only 6.6k out of the 1.08 million (0.61%) training examples deal with homosexuality, and those related to bisexuality are even more rare. This indicates that performance on these slices may suffer due to lack of training data.

Prepare the data

Define a feature map to parse the data. Each example will have a label, comment text, and identity features sexual orientation, gender, religion, race, and disability that are associated with the text.

BASE_DIR = tempfile.gettempdir()

TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'
FEATURE_MAP = {
    # Label:
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    # Text:
    TEXT_FEATURE:  tf.io.FixedLenFeature([], tf.string),

    # Identities:
    'sexual_orientation':tf.io.VarLenFeature(tf.string),
    'gender':tf.io.VarLenFeature(tf.string),
    'religion':tf.io.VarLenFeature(tf.string),
    'race':tf.io.VarLenFeature(tf.string),
    'disability':tf.io.VarLenFeature(tf.string),
}

Next, set up an input function to feed data into the model. Add a weight column to each example and upweight the toxic examples to account for the class imbalance identified by the TFDV. Use only identity features during the evaluation phase, as only the comments are fed into the model during training.

def train_input_fn():
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
    return (parsed_example,
            parsed_example[LABEL])
  train_dataset = tf.data.TFRecordDataset(
      filenames=[train_tf_file]).map(parse_function).batch(512)
  return train_dataset

Train the model

Create and train a deep learning model on the data.

model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))

embedded_text_feature_column = hub.text_embedding_column(
    key=TEXT_FEATURE,
    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

classifier = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    weight_column='weight',
    feature_columns=[embedded_text_feature_column],
    optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.003),
    loss_reduction=tf.losses.Reduction.SUM,
    n_classes=2,
    model_dir=model_dir)

classifier.train(input_fn=train_input_fn, steps=1000)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20220401-011742', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20220401-011742', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:397: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:397: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
2022-04-01 01:17:42.840683: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 987. Shape inference will have run different parts of the graph with different producer versions.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2188: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2188: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20220401-011742/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20220401-011742/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 58.530106, step = 0
INFO:tensorflow:loss = 58.530106, step = 0
INFO:tensorflow:global_step/sec: 100.2
INFO:tensorflow:global_step/sec: 100.2
INFO:tensorflow:loss = 56.24185, step = 100 (1.000 sec)
INFO:tensorflow:loss = 56.24185, step = 100 (1.000 sec)
INFO:tensorflow:global_step/sec: 111.495
INFO:tensorflow:global_step/sec: 111.495
INFO:tensorflow:loss = 47.095863, step = 200 (0.897 sec)
INFO:tensorflow:loss = 47.095863, step = 200 (0.897 sec)
INFO:tensorflow:global_step/sec: 112.063
INFO:tensorflow:global_step/sec: 112.063
INFO:tensorflow:loss = 55.677044, step = 300 (0.892 sec)
INFO:tensorflow:loss = 55.677044, step = 300 (0.892 sec)
INFO:tensorflow:global_step/sec: 112.57
INFO:tensorflow:global_step/sec: 112.57
INFO:tensorflow:loss = 55.89222, step = 400 (0.889 sec)
INFO:tensorflow:loss = 55.89222, step = 400 (0.889 sec)
INFO:tensorflow:global_step/sec: 110.006
INFO:tensorflow:global_step/sec: 110.006
INFO:tensorflow:loss = 41.648792, step = 500 (0.909 sec)
INFO:tensorflow:loss = 41.648792, step = 500 (0.909 sec)
INFO:tensorflow:global_step/sec: 110.513
INFO:tensorflow:global_step/sec: 110.513
INFO:tensorflow:loss = 45.61808, step = 600 (0.905 sec)
INFO:tensorflow:loss = 45.61808, step = 600 (0.905 sec)
INFO:tensorflow:global_step/sec: 112.917
INFO:tensorflow:global_step/sec: 112.917
INFO:tensorflow:loss = 51.14008, step = 700 (0.885 sec)
INFO:tensorflow:loss = 51.14008, step = 700 (0.885 sec)
INFO:tensorflow:global_step/sec: 114.308
INFO:tensorflow:global_step/sec: 114.308
INFO:tensorflow:loss = 47.67978, step = 800 (0.875 sec)
INFO:tensorflow:loss = 47.67978, step = 800 (0.875 sec)
INFO:tensorflow:global_step/sec: 113.203
INFO:tensorflow:global_step/sec: 113.203
INFO:tensorflow:loss = 47.92558, step = 900 (0.883 sec)
INFO:tensorflow:loss = 47.92558, step = 900 (0.883 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000...
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20220401-011742/model.ckpt.
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20220401-011742/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000...
INFO:tensorflow:Loss for final step: 50.775986.
INFO:tensorflow:Loss for final step: 50.775986.
<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7efdf477f0d0>

Analyze the model

After obtaining the trained model, analyze it to compute fairness metrics using TFMA and Fairness Indicators. Begin by exporting the model as a SavedModel.

Export SavedModel

def eval_input_receiver_fn():
  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_placeholder')

  # This *must* be a dictionary containing a single key 'examples', which
  # points to the input placeholder.
  receiver_tensors = {'examples': serialized_tf_example}

  features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
  features['weight'] = tf.ones_like(features[LABEL])

  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[LABEL])

tfma_export_dir = tfma.export.export_eval_savedmodel(
  estimator=classifier,
  export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
  eval_input_receiver_fn=eval_input_receiver_fn)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:132: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:132: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
2022-04-01 01:17:55.990572: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 987. Shape inference will have run different parts of the graph with different producer versions.
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
WARNING:tensorflow:Export includes no default signature!
WARNING:tensorflow:Export includes no default signature!
INFO:tensorflow:Restoring parameters from /tmp/train/20220401-011742/model.ckpt-1000
INFO:tensorflow:Restoring parameters from /tmp/train/20220401-011742/model.ckpt-1000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1648775875/assets
INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1648775875/assets
INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1648775875/saved_model.pb
INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1648775875/saved_model.pb

Compute Fairness Metrics

Select the identity to compute metrics for and whether to run with confidence intervals using the dropdown in the panel on the right.

Fairness Indicators Computation Options

Slice selection: sexual_orientation
Compute confidence intervals: False
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:164: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:164: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1648775875/variables/variables
INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1648775875/variables/variables
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:184: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:184: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.
WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: 
WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: -*-of-%(num_shards)05d
WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: -*-of-%(num_shards)05d

Visualize data using the What-if Tool

In this section, you'll use the What-If Tool's interactive visual interface to explore and manipulate data at a micro-level.

Each point on the scatter plot on the right-hand panel represents one of the examples in the subset loaded into the tool. Click on one of the points to see details about this particular example in the left-hand panel. The comment text, ground truth toxicity, and applicable identities are shown. At the bottom of this left-hand panel, you see the inference results from the model you just trained.

Modify the text of the example and then click the Run inference button to view how your changes caused the perceived toxicity prediction to change.

DEFAULT_MAX_EXAMPLES = 1000

# Load 100000 examples in memory. When first rendered, 
# What-If Tool should only display 1000 of these due to browser constraints.
def wit_dataset(file, num_examples=100000):
  dataset = tf.data.TFRecordDataset(
      filenames=[file]).take(num_examples)
  return [tf.train.Example.FromString(d.numpy()) for d in dataset]

wit_data = wit_dataset(train_tf_file)
config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(
    classifier, FEATURE_MAP).set_label_vocab(['non-toxicity', LABEL]).set_target_feature(LABEL)
wit = WitWidget(config_builder)

Render Fairness Indicators

Render the Fairness Indicators widget with the exported evaluation results.

Below you will see bar charts displaying performance of each slice of the data on selected metrics. You can adjust the baseline comparison slice as well as the displayed threshold(s) using the dropdown menus at the top of the visualization.

The Fairness Indicator widget is integrated with the What-If Tool rendered above. If you select one slice of the data in the bar chart, the What-If Tool will update to show you examples from the selected slice. When the data reloads in the What-If Tool above, try modifying Color By to toxicity. This can give you a visual understanding of the toxicity balance of examples by slice.

event_handlers={'slice-selected':
                wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}
widget_view.render_fairness_indicator(eval_result=eval_result,
                                      slicing_column=slice_selection,
                                      event_handlers=event_handlers
                                      )
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Overall', 'slice': 'Overall', 'metrics': {'average_los…

With this particular dataset and task, systematically higher false positive and false negative rates for certain identities can lead to negative consequences. For example, in a content moderation system, a higher-than-overall false positive rate for a certain group can lead to those voices being silenced. Thus, it is important to regularly evaluate these types of criteria as you develop and improve models, and utilize tools such as Fairness Indicators, TFDV, and WIT to help illuminate potential problems. Once you've identified fairness issues, you can experiment with new data sources, data balancing, or other techniques to improve performance on underperforming groups.

See here for more information and guidance on how to use Fairness Indicators.

Use fairness evaluation results

The eval_result object, rendered above in render_fairness_indicator(), has its own API that you can leverage to read TFMA results into your programs.

Get evaluated slices and metrics

Use get_slice_names() and get_metric_names() to get the evaluated slices and metrics, respectively.

pp = pprint.PrettyPrinter()

print("Slices:")
pp.pprint(eval_result.get_slice_names())
print("\nMetrics:")
pp.pprint(eval_result.get_metric_names())
Slices:
[(),
 (('sexual_orientation', 'homosexual_gay_or_lesbian'),),
 (('sexual_orientation', 'heterosexual'),),
 (('sexual_orientation', 'bisexual'),),
 (('sexual_orientation', 'other_sexual_orientation'),)]

Metrics:
['fairness_indicators_metrics/false_negative_rate@0.9',
 'fairness_indicators_metrics/false_omission_rate@0.5',
 'fairness_indicators_metrics/false_discovery_rate@0.7',
 'fairness_indicators_metrics/false_negative_rate@0.7',
 'fairness_indicators_metrics/negative_rate@0.5',
 'prediction/mean',
 'fairness_indicators_metrics/true_positive_rate@0.9',
 'fairness_indicators_metrics/false_omission_rate@0.9',
 'fairness_indicators_metrics/true_negative_rate@0.1',
 'fairness_indicators_metrics/negative_rate@0.7',
 'fairness_indicators_metrics/false_omission_rate@0.3',
 'fairness_indicators_metrics/false_positive_rate@0.1',
 'accuracy_baseline',
 'fairness_indicators_metrics/false_negative_rate@0.1',
 'post_export_metrics/example_count',
 'fairness_indicators_metrics/true_negative_rate@0.5',
 'fairness_indicators_metrics/false_discovery_rate@0.1',
 'fairness_indicators_metrics/false_positive_rate@0.5',
 'fairness_indicators_metrics/false_omission_rate@0.1',
 'fairness_indicators_metrics/positive_rate@0.5',
 'fairness_indicators_metrics/true_negative_rate@0.3',
 'fairness_indicators_metrics/false_positive_rate@0.9',
 'fairness_indicators_metrics/negative_rate@0.3',
 'recall',
 'fairness_indicators_metrics/true_positive_rate@0.3',
 'label/mean',
 'precision',
 'fairness_indicators_metrics/false_positive_rate@0.3',
 'fairness_indicators_metrics/true_negative_rate@0.7',
 'fairness_indicators_metrics/negative_rate@0.9',
 'fairness_indicators_metrics/true_negative_rate@0.9',
 'auc',
 'fairness_indicators_metrics/positive_rate@0.1',
 'accuracy',
 'average_loss',
 'fairness_indicators_metrics/false_positive_rate@0.7',
 'fairness_indicators_metrics/true_positive_rate@0.1',
 'fairness_indicators_metrics/false_discovery_rate@0.5',
 'fairness_indicators_metrics/false_negative_rate@0.3',
 'fairness_indicators_metrics/true_positive_rate@0.7',
 'fairness_indicators_metrics/false_discovery_rate@0.9',
 'fairness_indicators_metrics/positive_rate@0.9',
 'fairness_indicators_metrics/positive_rate@0.7',
 'fairness_indicators_metrics/false_discovery_rate@0.3',
 'auc_precision_recall',
 'fairness_indicators_metrics/false_omission_rate@0.7',
 'fairness_indicators_metrics/false_negative_rate@0.5',
 'fairness_indicators_metrics/negative_rate@0.1',
 'fairness_indicators_metrics/true_positive_rate@0.5',
 'fairness_indicators_metrics/positive_rate@0.3']

Use get_metrics_for_slice() to get the metrics for a particular slice as a dictionary mapping metric names to metric values.

baseline_slice = ()
heterosexual_slice = (('sexual_orientation', 'heterosexual'),)

print("Baseline metric values:")
pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))
print("\nHeterosexual metric values:")
pp.pprint(eval_result.get_metrics_for_slice(heterosexual_slice))
Baseline metric values:
{'accuracy': {'doubleValue': 0.7189999222755432},
 'accuracy_baseline': {'doubleValue': 0.9198060631752014},
 'auc': {'doubleValue': 0.7971373200416565},
 'auc_precision_recall': {'doubleValue': 0.3016054034233093},
 'average_loss': {'doubleValue': 0.5599656105041504},
 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9138781619518036},
 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8790012672063943},
 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.8159587822994037},
 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7085680337968667},
 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4829387755102041},
 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006183501450877435},
 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08796808069642117},
 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.27072682050573443},
 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5424554373359126},
 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8905969324305651},
 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006648096564531105},
 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017835318342747684},
 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03182318378020604},
 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04976753178016461},
 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07265323376074398},
 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9194463100892397},
 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5776488056694185},
 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28189574944206347},
 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09698910028401304},
 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.008908914034099637},
 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07458965302306254},
 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.3955357019184154},
 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6822273010596301},
 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8740965440820001},
 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9830320659325438},
 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9254103469769375},
 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6044642980815846},
 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.31777269894036986},
 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12590345591799987},
 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016967934067456194},
 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.08055368991076027},
 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.42235119433058155},
 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7181042505579366},
 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9030108997159869},
 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9910910859659003},
 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9938164985491226},
 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9120319193035789},
 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7292731794942656},
 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45754456266408733},
 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10940306756943485},
 'label/mean': {'doubleValue': 0.08019392192363739},
 'post_export_metrics/example_count': {'doubleValue': 721950.0},
 'precision': {'doubleValue': 0.18404121696949005},
 'prediction/mean': {'doubleValue': 0.3986806869506836},
 'recall': {'doubleValue': 0.7292732000350952} }

Heterosexual metric values:
{'accuracy': {'doubleValue': 0.5304877758026123},
 'accuracy_baseline': {'doubleValue': 0.7601625919342041},
 'auc': {'doubleValue': 0.669174313545227},
 'auc_precision_recall': {'doubleValue': 0.40801435708999634},
 'average_loss': {'doubleValue': 0.8247650861740112},
 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667},
 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7291139240506329},
 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6996466431095406},
 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.6453488372093024},
 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.37142857142857144},
 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.09322033898305085},
 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2796610169491525},
 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271},
 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8135593220338984},
 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0},
 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.1134020618556701},
 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.15789473684210525},
 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.178125},
 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.2100656455142232},
 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738},
 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288},
 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5294117647058824},
 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.2967914438502674},
 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.034759358288770054},
 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025},
 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.19715447154471544},
 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4247967479674797},
 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6504065040650406},
 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9288617886178862},
 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561},
 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8028455284552846},
 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5752032520325203},
 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.34959349593495936},
 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07113821138211382},
 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621},
 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113},
 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.47058823529411764},
 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7032085561497327},
 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9652406417112299},
 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9067796610169492},
 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7203389830508474},
 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728},
 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.1864406779661017},
 'label/mean': {'doubleValue': 0.2398373931646347},
 'post_export_metrics/example_count': {'doubleValue': 492.0},
 'precision': {'doubleValue': 0.30035334825515747},
 'prediction/mean': {'doubleValue': 0.5564622282981873},
 'recall': {'doubleValue': 0.7203390002250671} }

Use get_metrics_for_all_slices() to get the metrics for all slices as a dictionary mapping each slice to the corresponding metrics dictionary you obtain from running get_metrics_for_slice() on it.

pp.pprint(eval_result.get_metrics_for_all_slices())
{(): {'accuracy': {'doubleValue': 0.7189999222755432},
      'accuracy_baseline': {'doubleValue': 0.9198060631752014},
      'auc': {'doubleValue': 0.7971373200416565},
      'auc_precision_recall': {'doubleValue': 0.3016054034233093},
      'average_loss': {'doubleValue': 0.5599656105041504},
      'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9138781619518036},
      'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8790012672063943},
      'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.8159587822994037},
      'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7085680337968667},
      'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4829387755102041},
      'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006183501450877435},
      'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08796808069642117},
      'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.27072682050573443},
      'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5424554373359126},
      'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8905969324305651},
      'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006648096564531105},
      'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017835318342747684},
      'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03182318378020604},
      'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04976753178016461},
      'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07265323376074398},
      'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9194463100892397},
      'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5776488056694185},
      'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28189574944206347},
      'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09698910028401304},
      'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.008908914034099637},
      'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07458965302306254},
      'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.3955357019184154},
      'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6822273010596301},
      'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8740965440820001},
      'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9830320659325438},
      'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9254103469769375},
      'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6044642980815846},
      'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.31777269894036986},
      'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12590345591799987},
      'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016967934067456194},
      'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.08055368991076027},
      'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.42235119433058155},
      'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7181042505579366},
      'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9030108997159869},
      'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9910910859659003},
      'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9938164985491226},
      'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9120319193035789},
      'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7292731794942656},
      'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45754456266408733},
      'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10940306756943485},
      'label/mean': {'doubleValue': 0.08019392192363739},
      'post_export_metrics/example_count': {'doubleValue': 721950.0},
      'precision': {'doubleValue': 0.18404121696949005},
      'prediction/mean': {'doubleValue': 0.3986806869506836},
      'recall': {'doubleValue': 0.7292732000350952} },
 (('sexual_orientation', 'bisexual'),): {'accuracy': {'doubleValue': 0.5517241358757019},
                                         'accuracy_baseline': {'doubleValue': 0.8017241358757019},
                                         'auc': {'doubleValue': 0.6257596611976624},
                                         'auc_precision_recall': {'doubleValue': 0.3276112675666809},
                                         'average_loss': {'doubleValue': 0.7453476190567017},
                                         'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7870370370370371},
                                         'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7790697674418605},
                                         'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7543859649122807},
                                         'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.68},
                                         'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.5},
                                         'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
                                         'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.17391304347826086},
                                         'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.391304347826087},
                                         'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.6521739130434783},
                                         'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.9565217391304348},
                                         'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0},
                                         'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.13333333333333333},
                                         'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.15254237288135594},
                                         'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.16483516483516483},
                                         'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.19298245614035087},
                                         'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9139784946236559},
                                         'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7204301075268817},
                                         'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.46236559139784944},
                                         'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.1827956989247312},
                                         'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.010752688172043012},
                                         'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.06896551724137931},
                                         'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.25862068965517243},
                                         'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.5086206896551724},
                                         'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.7844827586206896},
                                         'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9827586206896551},
                                         'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9310344827586207},
                                         'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.7413793103448276},
                                         'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.49137931034482757},
                                         'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.21551724137931033},
                                         'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.017241379310344827},
                                         'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.08602150537634409},
                                         'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.27956989247311825},
                                         'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5376344086021505},
                                         'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.8172043010752689},
                                         'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.989247311827957},
                                         'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
                                         'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.8260869565217391},
                                         'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.6086956521739131},
                                         'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.34782608695652173},
                                         'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.043478260869565216},
                                         'label/mean': {'doubleValue': 0.1982758641242981},
                                         'post_export_metrics/example_count': {'doubleValue': 116.0},
                                         'precision': {'doubleValue': 0.24561403691768646},
                                         'prediction/mean': {'doubleValue': 0.49021556973457336},
                                         'recall': {'doubleValue': 0.6086956262588501} },
 (('sexual_orientation', 'heterosexual'),): {'accuracy': {'doubleValue': 0.5304877758026123},
                                             'accuracy_baseline': {'doubleValue': 0.7601625919342041},
                                             'auc': {'doubleValue': 0.669174313545227},
                                             'auc_precision_recall': {'doubleValue': 0.40801435708999634},
                                             'average_loss': {'doubleValue': 0.8247650861740112},
                                             'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667},
                                             'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7291139240506329},
                                             'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6996466431095406},
                                             'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.6453488372093024},
                                             'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.37142857142857144},
                                             'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
                                             'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.09322033898305085},
                                             'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2796610169491525},
                                             'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271},
                                             'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8135593220338984},
                                             'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0},
                                             'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.1134020618556701},
                                             'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.15789473684210525},
                                             'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.178125},
                                             'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.2100656455142232},
                                             'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738},
                                             'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288},
                                             'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5294117647058824},
                                             'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.2967914438502674},
                                             'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.034759358288770054},
                                             'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025},
                                             'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.19715447154471544},
                                             'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4247967479674797},
                                             'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6504065040650406},
                                             'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9288617886178862},
                                             'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561},
                                             'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8028455284552846},
                                             'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5752032520325203},
                                             'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.34959349593495936},
                                             'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07113821138211382},
                                             'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621},
                                             'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113},
                                             'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.47058823529411764},
                                             'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7032085561497327},
                                             'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9652406417112299},
                                             'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
                                             'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9067796610169492},
                                             'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7203389830508474},
                                             'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728},
                                             'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.1864406779661017},
                                             'label/mean': {'doubleValue': 0.2398373931646347},
                                             'post_export_metrics/example_count': {'doubleValue': 492.0},
                                             'precision': {'doubleValue': 0.30035334825515747},
                                             'prediction/mean': {'doubleValue': 0.5564622282981873},
                                             'recall': {'doubleValue': 0.7203390002250671} },
 (('sexual_orientation', 'homosexual_gay_or_lesbian'),): {'accuracy': {'doubleValue': 0.5854213833808899},
                                                          'accuracy_baseline': {'doubleValue': 0.7182232141494751},
                                                          'auc': {'doubleValue': 0.7076278924942017},
                                                          'auc_precision_recall': {'doubleValue': 0.47185906767845154},
                                                          'average_loss': {'doubleValue': 0.7366951107978821},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7105016408813877},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.6720521541950113},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6169273967107902},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5379008746355685},
                                                          'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.41044776119402987},
                                                          'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0016168148746968471},
                                                          'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0646725949878739},
                                                          'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.22797089733225545},
                                                          'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4874696847210994},
                                                          'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.872271624898949},
                                                          'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.016129032258064516},
                                                          'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.09280742459396751},
                                                          'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.14865577227200844},
                                                          'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.19980119284294234},
                                                          'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.26176613294517226},
                                                          'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9613066920393276},
                                                          'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7519822391373295},
                                                          'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.4877894069140501},
                                                          'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.2340627973358706},
                                                          'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.034887408816999685},
                                                          'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.02824601366742597},
                                                          'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.19635535307517085},
                                                          'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.43211845102505697},
                                                          'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6874715261958998},
                                                          'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9389521640091116},
                                                          'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.971753986332574},
                                                          'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8036446469248292},
                                                          'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.567881548974943},
                                                          'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.3125284738041002},
                                                          'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.06104783599088838},
                                                          'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03869330796067238},
                                                          'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.24801776086267047},
                                                          'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5122105930859499},
                                                          'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7659372026641293},
                                                          'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9651125911830003},
                                                          'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9983831851253031},
                                                          'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9353274050121261},
                                                          'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7720291026677445},
                                                          'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5125303152789006},
                                                          'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.12772837510105092},
                                                          'label/mean': {'doubleValue': 0.2817767560482025},
                                                          'post_export_metrics/example_count': {'doubleValue': 4390.0},
                                                          'precision': {'doubleValue': 0.3830726146697998},
                                                          'prediction/mean': {'doubleValue': 0.5432790517807007},
                                                          'recall': {'doubleValue': 0.7720291018486023} },
 (('sexual_orientation', 'other_sexual_orientation'),): {'accuracy': {'doubleValue': 0.6000000238418579},
                                                         'accuracy_baseline': {'doubleValue': 0.800000011920929},
                                                         'auc': {'doubleValue': 1.0},
                                                         'auc_precision_recall': {'doubleValue': 1.0},
                                                         'average_loss': {'doubleValue': 0.7369350790977478},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.8},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.75},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6666666666666666},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5},
                                                         'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 'NaN'},
                                                         'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 'NaN'},
                                                         'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.2},
                                                         'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.75},
                                                         'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5},
                                                         'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.25},
                                                         'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.2},
                                                         'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4},
                                                         'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6},
                                                         'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8},
                                                         'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.6},
                                                         'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.4},
                                                         'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0},
                                                         'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.25},
                                                         'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5},
                                                         'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.75},
                                                         'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 1.0},
                                                         'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.0},
                                                         'label/mean': {'doubleValue': 0.20000000298023224},
                                                         'post_export_metrics/example_count': {'doubleValue': 5.0},
                                                         'precision': {'doubleValue': 0.3333333432674408},
                                                         'prediction/mean': {'doubleValue': 0.6018183827400208},
                                                         'recall': {'doubleValue': 1.0} } }