Ver no TensorFlow.org | Executar no Google Colab | Ver no GitHub | Baixar caderno | Veja o modelo do TF Hub |
Visão geral
Fairness Indicators é um conjunto de ferramentas construídas em cima de TensorFlow Modelo de Análise (TFMA) que permitem a avaliação periódica das métricas de justiça em oleodutos de produtos. O TFMA é uma biblioteca para avaliar modelos de aprendizado de máquina TensorFlow e não TensorFlow. Ele permite que você avalie seus modelos em grandes quantidades de dados de maneira distribuída, calcule no gráfico e outras métricas em diferentes fatias de dados e visualize-as em notebooks.
Fairness Indicators é empacotado com TensorFlow Validação de dados (TFDV) ea Ferramenta What-Se . O uso de indicadores de imparcialidade permite que você:
- Avalie o desempenho do modelo, dividido em grupos definidos de usuários
- Ganhe confiança sobre os resultados com intervalos de confiança e avaliações em vários limites
- Avalie a distribuição dos conjuntos de dados
- Mergulhe profundamente em fatias individuais para explorar as causas-raiz e oportunidades de melhoria
Neste caderno, você vai usar Fairness Indicadores para corrigir problemas de equidade em um modelo de você treinar usando o Civil Comentários dataset . Assista a este vídeo para mais detalhes e contexto no cenário do mundo real isso é baseado em que é também uma das principais motivações para a criação de Equidade indicadores.
Conjunto de dados
Neste caderno, você vai trabalhar com o Civil Comentários conjunto de dados , aproximadamente 2 milhões de comentários públicos tornado público pela Civil Comentários plataforma em 2017 para investigação em curso. Este esforço foi patrocinado por Jigsaw , que já sediaram competições em Kaggle para ajudar a classificar comentários tóxicos, bem como minimizar o viés modelo não intencional.
Cada comentário de texto individual no conjunto de dados tem um rótulo de toxicidade, com o rótulo sendo 1 se o comentário for tóxico e 0 se o comentário não for tóxico. Dentro dos dados, um subconjunto de comentários é rotulado com uma variedade de atributos de identidade, incluindo categorias de gênero, orientação sexual, religião e raça ou etnia.
Configurar
Instale fairness-indicators
e witwidget
.
pip install -q -U pip==20.2
pip install -q fairness-indicators
pip install -q witwidget
Você deve reiniciar o tempo de execução do Colab após a instalação. Select Runtime> Restart tempo de execução a partir do menu Colab.
Não prossiga com o restante deste tutorial sem primeiro reiniciar o tempo de execução.
Importe todas as outras bibliotecas necessárias.
import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime
import pprint
from google.protobuf import text_format
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv
from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view
from fairness_indicators.tutorial_utils import util
from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget
from tensorflow_metadata.proto.v0 import schema_pb2
Baixe e analise os dados
Por padrão, este notebook baixa uma versão pré-processada desse conjunto de dados, mas você pode usar o conjunto de dados original e executar novamente as etapas de processamento, se desejar. No conjunto de dados original, cada comentário é rotulado com a porcentagem de avaliadores que acreditam que um comentário corresponde a uma identidade específica. Por exemplo, um comentário pode ser rotulado com o seguinte: { masculino: 0,3, feminino: 1,0, transgênero: 0,0, heterossexual: 0,8, homossexual_gay_or_lésbica: 1,0 } A etapa de processamento agrupa a identidade por categoria (gênero, orientação_sexual etc.) e remove identidades com pontuação inferior a 0,5. Assim, o exemplo acima seria convertido para o seguinte: de avaliadores que acreditam que um comentário corresponde a uma determinada identidade. Por exemplo, o comentário seria marcado com o seguinte: { gênero: [feminino], orientação_sexual: [heterossexual, homossexual_gay_or_lésbica] }
download_original_data = False
if download_original_data:
train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')
# The identity terms list will be grouped together by their categories
# (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,
# text column and label column will be kept after processing.
train_tf_file = util.convert_comments_data(train_tf_file)
validate_tf_file = util.convert_comments_data(validate_tf_file)
else:
train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')
Use o TFDV para analisar os dados e encontrar possíveis problemas neles, como valores ausentes e desequilíbrios de dados, que podem levar a disparidades de justiça.
stats = tfdv.generate_statistics_from_tfrecord(data_location=train_tf_file)
tfdv.visualize_statistics(stats)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features. WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)` WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)`
O TFDV mostra que existem alguns desequilíbrios significativos nos dados que podem levar a resultados tendenciosos do modelo.
O rótulo de toxicidade (o valor previsto pelo modelo) está desbalanceado. Apenas 8% dos exemplos no conjunto de treinamento são tóxicos, o que significa que um classificador pode obter 92% de precisão ao prever que todos os comentários não são tóxicos.
Nos domínios relativos aos termos identitários, apenas 6,6 mil dos 1,08 milhões (0,61%) exemplos de formação tratam da homossexualidade, sendo ainda mais raros os relacionados com a bissexualidade. Isso indica que o desempenho nessas fatias pode sofrer devido à falta de dados de treinamento.
Preparar os dados
Defina um mapa de recursos para analisar os dados. Cada exemplo terá um rótulo, comentário de texto, e identidade apresenta sexual orientation
, gender
, religion
, race
e disability
que estão associados com o texto.
BASE_DIR = tempfile.gettempdir()
TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'
FEATURE_MAP = {
# Label:
LABEL: tf.io.FixedLenFeature([], tf.float32),
# Text:
TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),
# Identities:
'sexual_orientation':tf.io.VarLenFeature(tf.string),
'gender':tf.io.VarLenFeature(tf.string),
'religion':tf.io.VarLenFeature(tf.string),
'race':tf.io.VarLenFeature(tf.string),
'disability':tf.io.VarLenFeature(tf.string),
}
Em seguida, configure uma função de entrada para alimentar dados no modelo. Adicione uma coluna de peso a cada exemplo e aumente os exemplos tóxicos para explicar o desequilíbrio de classe identificado pelo TFDV. Use apenas recursos de identidade durante a fase de avaliação, pois apenas os comentários são inseridos no modelo durante o treinamento.
def train_input_fn():
def parse_function(serialized):
parsed_example = tf.io.parse_single_example(
serialized=serialized, features=FEATURE_MAP)
# Adds a weight column to deal with unbalanced classes.
parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
return (parsed_example,
parsed_example[LABEL])
train_dataset = tf.data.TFRecordDataset(
filenames=[train_tf_file]).map(parse_function).batch(512)
return train_dataset
Treine o modelo
Crie e treine um modelo de aprendizado profundo nos dados.
model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
"%Y%m%d-%H%M%S"))
embedded_text_feature_column = hub.text_embedding_column(
key=TEXT_FEATURE,
module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')
classifier = tf.estimator.DNNClassifier(
hidden_units=[500, 100],
weight_column='weight',
feature_columns=[embedded_text_feature_column],
optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.003),
loss_reduction=tf.losses.Reduction.SUM,
n_classes=2,
model_dir=model_dir)
classifier.train(input_fn=train_input_fn, steps=1000)
INFO:tensorflow:Using default config. INFO:tensorflow:Using default config. INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210923-205025', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210923-205025', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Saver not created because there are no variables in the graph to restore 2021-09-23 20:50:26.540914: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 808. Shape inference will have run different parts of the graph with different producer versions. INFO:tensorflow:Saver not created because there are no variables in the graph to restore WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... INFO:tensorflow:loss = 59.34932, step = 0 INFO:tensorflow:loss = 59.34932, step = 0 INFO:tensorflow:global_step/sec: 108.435 INFO:tensorflow:global_step/sec: 108.435 INFO:tensorflow:loss = 56.416668, step = 100 (0.924 sec) INFO:tensorflow:loss = 56.416668, step = 100 (0.924 sec) INFO:tensorflow:global_step/sec: 116.367 INFO:tensorflow:global_step/sec: 116.367 INFO:tensorflow:loss = 47.250374, step = 200 (0.859 sec) INFO:tensorflow:loss = 47.250374, step = 200 (0.859 sec) INFO:tensorflow:global_step/sec: 116.333 INFO:tensorflow:global_step/sec: 116.333 INFO:tensorflow:loss = 55.81682, step = 300 (0.860 sec) INFO:tensorflow:loss = 55.81682, step = 300 (0.860 sec) INFO:tensorflow:global_step/sec: 116.844 INFO:tensorflow:global_step/sec: 116.844 INFO:tensorflow:loss = 55.814293, step = 400 (0.856 sec) INFO:tensorflow:loss = 55.814293, step = 400 (0.856 sec) INFO:tensorflow:global_step/sec: 114.434 INFO:tensorflow:global_step/sec: 114.434 INFO:tensorflow:loss = 41.805046, step = 500 (0.874 sec) INFO:tensorflow:loss = 41.805046, step = 500 (0.874 sec) INFO:tensorflow:global_step/sec: 115.693 INFO:tensorflow:global_step/sec: 115.693 INFO:tensorflow:loss = 45.53726, step = 600 (0.864 sec) INFO:tensorflow:loss = 45.53726, step = 600 (0.864 sec) INFO:tensorflow:global_step/sec: 115.772 INFO:tensorflow:global_step/sec: 115.772 INFO:tensorflow:loss = 51.17028, step = 700 (0.864 sec) INFO:tensorflow:loss = 51.17028, step = 700 (0.864 sec) INFO:tensorflow:global_step/sec: 116.131 INFO:tensorflow:global_step/sec: 116.131 INFO:tensorflow:loss = 47.696205, step = 800 (0.861 sec) INFO:tensorflow:loss = 47.696205, step = 800 (0.861 sec) INFO:tensorflow:global_step/sec: 115.609 INFO:tensorflow:global_step/sec: 115.609 INFO:tensorflow:loss = 47.800926, step = 900 (0.865 sec) INFO:tensorflow:loss = 47.800926, step = 900 (0.865 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000... INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000... INFO:tensorflow:Loss for final step: 50.67367. INFO:tensorflow:Loss for final step: 50.67367. <tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f113351ebd0>
Analise o modelo
Depois de obter o modelo treinado, analise-o para calcular métricas de imparcialidade usando TFMA e indicadores de imparcialidade. Comece por exportar o modelo como um SavedModel .
Exportar modelo salvo
def eval_input_receiver_fn():
serialized_tf_example = tf.compat.v1.placeholder(
dtype=tf.string, shape=[None], name='input_example_placeholder')
# This *must* be a dictionary containing a single key 'examples', which
# points to the input placeholder.
receiver_tensors = {'examples': serialized_tf_example}
features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
features['weight'] = tf.ones_like(features[LABEL])
return tfma.export.EvalInputReceiver(
features=features,
receiver_tensors=receiver_tensors,
labels=features[LABEL])
tfma_export_dir = tfma.export.export_eval_savedmodel(
estimator=classifier,
export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
eval_input_receiver_fn=eval_input_receiver_fn)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Saver not created because there are no variables in the graph to restore 2021-09-23 20:50:39.359797: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 808. Shape inference will have run different parts of the graph with different producer versions. INFO:tensorflow:Saver not created because there are no variables in the graph to restore INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Predict: None INFO:tensorflow:Signatures INCLUDED in export for Predict: None INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval'] INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval'] WARNING:tensorflow:Export includes no default signature! WARNING:tensorflow:Export includes no default signature! INFO:tensorflow:Restoring parameters from /tmp/train/20210923-205025/model.ckpt-1000 INFO:tensorflow:Restoring parameters from /tmp/train/20210923-205025/model.ckpt-1000 INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1632430239/assets INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1632430239/assets INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1632430239/saved_model.pb INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1632430239/saved_model.pb
Calcular Métricas de Imparcialidade
Selecione a identidade para calcular as métricas e se deseja executar com intervalos de confiança usando a lista suspensa no painel à direita.
Opções de cálculo de indicadores de imparcialidade
tfma_eval_result_path = os.path.join(BASE_DIR, 'tfma_eval_result')
slice_selection = 'sexual_orientation'
print(f'Slice selection: {slice_selection}')
compute_confidence_intervals = False
print(f'Compute confidence intervals: {compute_confidence_intervals}')
# Define slices that you want the evaluation to run on.
eval_config_pbtxt = """
model_specs {
label_key: "%s"
}
metrics_specs {
metrics {
class_name: "FairnessIndicators"
config: '{ "thresholds": [0.1, 0.3, 0.5, 0.7, 0.9] }'
}
}
slicing_specs {} # overall slice
slicing_specs {
feature_keys: ["%s"]
}
options {
compute_confidence_intervals { value: %s }
disabled_outputs { values: "analysis" }
}
""" % (LABEL, slice_selection, compute_confidence_intervals)
eval_config = text_format.Parse(eval_config_pbtxt, tfma.EvalConfig())
eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=tfma_export_dir)
schema = text_format.Parse(
"""
tensor_representation_group {
key: ""
value {
tensor_representation {
key: "comment_text"
value {
dense_tensor {
column_name: "comment_text"
shape {}
}
}
}
}
}
feature {
name: "comment_text"
type: BYTES
}
feature {
name: "toxicity"
type: FLOAT
}
feature {
name: "sexual_orientation"
type: BYTES
}
feature {
name: "gender"
type: BYTES
}
feature {
name: "religion"
type: BYTES
}
feature {
name: "race"
type: BYTES
}
feature {
name: "disability"
type: BYTES
}
""", schema_pb2.Schema())
tfxio = tf_example_record.TFExampleRecord(
file_pattern=validate_tf_file,
schema=schema,
raw_record_column_name=tfma.ARROW_INPUT_COLUMN)
tensor_adapter_config = tensor_adapter.TensorAdapterConfig(
arrow_schema=tfxio.ArrowSchema(),
tensor_representations=tfxio.TensorRepresentations())
with beam.Pipeline() as pipeline:
(pipeline
| 'ReadFromTFRecordToArrow' >> tfxio.BeamSource()
| 'ExtractEvaluateAndWriteResults' >> tfma.ExtractEvaluateAndWriteResults(
eval_config=eval_config,
eval_shared_model=eval_shared_model,
output_path=tfma_eval_result_path,
tensor_adapter_config=tensor_adapter_config))
eval_result = tfma.load_eval_result(output_path=tfma_eval_result_path)
Slice selection: sexual_orientation Compute confidence intervals: False WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1632430239/variables/variables INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1632430239/variables/variables WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info. WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching:
Visualize dados usando a ferramenta What-if
Nesta seção, você usará a interface visual interativa da ferramenta What-If para explorar e manipular dados em um nível micro.
Cada ponto no gráfico de dispersão no painel direito representa um dos exemplos no subconjunto carregado na ferramenta. Clique em um dos pontos para ver detalhes sobre este exemplo específico no painel esquerdo. O texto do comentário, a toxicidade da verdade do terreno e as identidades aplicáveis são mostrados. Na parte inferior deste painel esquerdo, você vê os resultados de inferência do modelo que acabou de treinar.
Modificar o texto do exemplo e, em seguida, clique no botão Executar inferência vista como as alterações causou a previsão de toxicidade percebida à mudança.
DEFAULT_MAX_EXAMPLES = 1000
# Load 100000 examples in memory. When first rendered,
# What-If Tool should only display 1000 of these due to browser constraints.
def wit_dataset(file, num_examples=100000):
dataset = tf.data.TFRecordDataset(
filenames=[file]).take(num_examples)
return [tf.train.Example.FromString(d.numpy()) for d in dataset]
wit_data = wit_dataset(train_tf_file)
config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(
classifier, FEATURE_MAP).set_label_vocab(['non-toxicity', LABEL]).set_target_feature(LABEL)
wit = WitWidget(config_builder)
Indicadores de equidade de renderização
Renderize o widget Indicadores de imparcialidade com os resultados da avaliação exportados.
Abaixo, você verá gráficos de barras exibindo o desempenho de cada fatia dos dados nas métricas selecionadas. Você pode ajustar a fatia de comparação da linha de base, bem como os limites exibidos, usando os menus suspensos na parte superior da visualização.
O widget Indicador de imparcialidade é integrado à ferramenta What-If renderizada acima. Se você selecionar uma fatia dos dados no gráfico de barras, a ferramenta What-If será atualizada para mostrar exemplos da fatia selecionada. Quando recarrega dados na ferramenta What-If acima, tente modificar a cores pela toxicidade. Isso pode fornecer uma compreensão visual do equilíbrio de toxicidade dos exemplos por fatia.
event_handlers={'slice-selected':
wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}
widget_view.render_fairness_indicator(eval_result=eval_result,
slicing_column=slice_selection,
event_handlers=event_handlers
)
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Overall', 'slice': 'Overall', 'metrics': {'prediction/…
Com esse conjunto de dados e tarefa específicos, taxas sistematicamente mais altas de falsos positivos e falsos negativos para certas identidades podem levar a consequências negativas. Por exemplo, em um sistema de moderação de conteúdo, uma taxa de falsos positivos acima do geral para um determinado grupo pode fazer com que essas vozes sejam silenciadas. Portanto, é importante avaliar regularmente esses tipos de critérios à medida que você desenvolve e aprimora modelos e utiliza ferramentas como indicadores de justiça, TFDV e WIT para ajudar a esclarecer possíveis problemas. Depois de identificar os problemas de imparcialidade, você pode experimentar novas fontes de dados, balanceamento de dados ou outras técnicas para melhorar o desempenho em grupos com baixo desempenho.
Veja aqui para mais informações e orientações sobre como usar Fairness indicadores.
Use os resultados da avaliação de imparcialidade
O eval_result
objeto, prestados acima em render_fairness_indicator()
, tem a sua própria API que você pode aproveitar para ler os resultados TFMA em seus programas.
Obtenha fatias e métricas avaliadas
Use get_slice_names()
e get_metric_names()
para obter as fatias avaliadas e métricas, respectivamente.
pp = pprint.PrettyPrinter()
print("Slices:")
pp.pprint(eval_result.get_slice_names())
print("\nMetrics:")
pp.pprint(eval_result.get_metric_names())
Slices: [(), (('sexual_orientation', 'homosexual_gay_or_lesbian'),), (('sexual_orientation', 'heterosexual'),), (('sexual_orientation', 'bisexual'),), (('sexual_orientation', 'other_sexual_orientation'),)] Metrics: ['fairness_indicators_metrics/negative_rate@0.1', 'fairness_indicators_metrics/positive_rate@0.7', 'fairness_indicators_metrics/false_discovery_rate@0.9', 'fairness_indicators_metrics/false_negative_rate@0.3', 'fairness_indicators_metrics/false_omission_rate@0.1', 'accuracy', 'fairness_indicators_metrics/false_discovery_rate@0.7', 'fairness_indicators_metrics/false_negative_rate@0.7', 'label/mean', 'fairness_indicators_metrics/true_positive_rate@0.5', 'fairness_indicators_metrics/false_positive_rate@0.1', 'recall', 'fairness_indicators_metrics/false_omission_rate@0.7', 'fairness_indicators_metrics/false_positive_rate@0.7', 'auc_precision_recall', 'fairness_indicators_metrics/negative_rate@0.7', 'fairness_indicators_metrics/negative_rate@0.3', 'fairness_indicators_metrics/false_discovery_rate@0.3', 'fairness_indicators_metrics/true_negative_rate@0.9', 'fairness_indicators_metrics/false_omission_rate@0.3', 'fairness_indicators_metrics/false_negative_rate@0.1', 'fairness_indicators_metrics/true_negative_rate@0.3', 'fairness_indicators_metrics/true_positive_rate@0.7', 'fairness_indicators_metrics/false_positive_rate@0.3', 'fairness_indicators_metrics/true_positive_rate@0.1', 'fairness_indicators_metrics/true_positive_rate@0.9', 'fairness_indicators_metrics/false_negative_rate@0.9', 'fairness_indicators_metrics/positive_rate@0.5', 'fairness_indicators_metrics/positive_rate@0.9', 'fairness_indicators_metrics/negative_rate@0.9', 'fairness_indicators_metrics/true_negative_rate@0.1', 'fairness_indicators_metrics/false_omission_rate@0.5', 'post_export_metrics/example_count', 'fairness_indicators_metrics/false_omission_rate@0.9', 'fairness_indicators_metrics/negative_rate@0.5', 'fairness_indicators_metrics/false_positive_rate@0.5', 'fairness_indicators_metrics/positive_rate@0.3', 'prediction/mean', 'accuracy_baseline', 'fairness_indicators_metrics/true_negative_rate@0.5', 'fairness_indicators_metrics/false_discovery_rate@0.5', 'fairness_indicators_metrics/false_discovery_rate@0.1', 'precision', 'fairness_indicators_metrics/false_positive_rate@0.9', 'fairness_indicators_metrics/true_positive_rate@0.3', 'auc', 'average_loss', 'fairness_indicators_metrics/positive_rate@0.1', 'fairness_indicators_metrics/false_negative_rate@0.5', 'fairness_indicators_metrics/true_negative_rate@0.7']
Use get_metrics_for_slice()
para obter as métricas para uma fatia especial como um mapeamento de dicionário nomes métricas para valores de métricas .
baseline_slice = ()
heterosexual_slice = (('sexual_orientation', 'heterosexual'),)
print("Baseline metric values:")
pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))
print("\nHeterosexual metric values:")
pp.pprint(eval_result.get_metrics_for_slice(heterosexual_slice))
Baseline metric values: {'accuracy': {'doubleValue': 0.7174859642982483}, 'accuracy_baseline': {'doubleValue': 0.9198060631752014}, 'auc': {'doubleValue': 0.796409547328949}, 'auc_precision_recall': {'doubleValue': 0.3000231087207794}, 'average_loss': {'doubleValue': 0.5615971088409424}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9139404145348933}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8796606156634021}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.816806708107944}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7090802784427505}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4814937210839392}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006079867348348763}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08696628437197734}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2705713693519414}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5445108470360647}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.891598728755009}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006604499315158452}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017811407791031682}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03187681488249431}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04993640137936933}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07271999842219298}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9202700382800194}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5818879187535954}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28355525303665063}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09679333307231039}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.00877639469079322}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07382367199944595}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.39155620195304386}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6806884133250225}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8744414433132488}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9832342960038783}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.926176328000554}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6084437980469561}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.3193115866749775}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12555855668675117}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016765703996121616}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0797299617199806}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.41811208124640464}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7164447469633494}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9032066669276896}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9912236053092068}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9939201326516512}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9130337156280227}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7294286306480586}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45548915296393533}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10840127124499102}, 'label/mean': {'doubleValue': 0.08019392192363739}, 'post_export_metrics/example_count': {'doubleValue': 721950.0}, 'precision': {'doubleValue': 0.18319329619407654}, 'prediction/mean': {'doubleValue': 0.3998037576675415}, 'recall': {'doubleValue': 0.7294286489486694} } Heterosexual metric values: {'accuracy': {'doubleValue': 0.5203251838684082}, 'accuracy_baseline': {'doubleValue': 0.7601625919342041}, 'auc': {'doubleValue': 0.6672822833061218}, 'auc_precision_recall': {'doubleValue': 0.4065391719341278}, 'average_loss': {'doubleValue': 0.8273133039474487}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7272727272727273}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7062937062937062}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.655367231638418}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4473684210526316}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0847457627118644}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.288135593220339}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8220338983050848}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.10416666666666667}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.1650485436893204}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.18095238095238095}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.21365638766519823}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5401069518716578}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.31016042780748665}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.045454545454545456}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.1951219512195122}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4186991869918699}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6402439024390244}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9227642276422764}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8048780487804879}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5813008130081301}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.3597560975609756}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07723577235772358}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.45989304812834225}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.6898395721925134}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9545454545454546}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9152542372881356}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.711864406779661}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.17796610169491525}, 'label/mean': {'doubleValue': 0.2398373931646347}, 'post_export_metrics/example_count': {'doubleValue': 492.0}, 'precision': {'doubleValue': 0.2937062978744507}, 'prediction/mean': {'doubleValue': 0.5578703880310059}, 'recall': {'doubleValue': 0.7118644118309021} }
Use get_metrics_for_all_slices()
para obter as métricas para todas as fatias como um dicionário mapeando cada fatia com as métricas correspondentes dicionário você obter a execução de get_metrics_for_slice()
nele.
pp.pprint(eval_result.get_metrics_for_all_slices())
{(): {'accuracy': {'doubleValue': 0.7174859642982483}, 'accuracy_baseline': {'doubleValue': 0.9198060631752014}, 'auc': {'doubleValue': 0.796409547328949}, 'auc_precision_recall': {'doubleValue': 0.3000231087207794}, 'average_loss': {'doubleValue': 0.5615971088409424}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9139404145348933}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8796606156634021}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.816806708107944}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7090802784427505}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4814937210839392}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006079867348348763}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08696628437197734}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2705713693519414}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5445108470360647}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.891598728755009}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006604499315158452}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017811407791031682}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03187681488249431}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04993640137936933}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07271999842219298}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9202700382800194}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5818879187535954}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28355525303665063}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09679333307231039}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.00877639469079322}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07382367199944595}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.39155620195304386}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6806884133250225}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8744414433132488}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9832342960038783}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.926176328000554}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6084437980469561}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.3193115866749775}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12555855668675117}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016765703996121616}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0797299617199806}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.41811208124640464}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7164447469633494}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9032066669276896}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9912236053092068}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9939201326516512}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9130337156280227}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7294286306480586}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45548915296393533}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10840127124499102}, 'label/mean': {'doubleValue': 0.08019392192363739}, 'post_export_metrics/example_count': {'doubleValue': 721950.0}, 'precision': {'doubleValue': 0.18319329619407654}, 'prediction/mean': {'doubleValue': 0.3998037576675415}, 'recall': {'doubleValue': 0.7294286489486694} }, (('sexual_orientation', 'bisexual'),): {'accuracy': {'doubleValue': 0.5258620977401733}, 'accuracy_baseline': {'doubleValue': 0.8017241358757019}, 'auc': {'doubleValue': 0.6252922415733337}, 'auc_precision_recall': {'doubleValue': 0.3546649217605591}, 'average_loss': {'doubleValue': 0.7461641430854797}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7870370370370371}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7816091954022989}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7666666666666667}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7037037037037037}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.17391304347826086}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.391304347826087}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.6521739130434783}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.9130434782608695}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.13793103448275862}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.16071428571428573}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.16853932584269662}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.18421052631578946}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9139784946236559}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7311827956989247}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.4946236559139785}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.20430107526881722}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.06896551724137931}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.25}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4827586206896552}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.7672413793103449}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9827586206896551}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9310344827586207}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.75}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5172413793103449}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.23275862068965517}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.017241379310344827}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.08602150537634409}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.26881720430107525}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5053763440860215}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7956989247311828}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.8260869565217391}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.6086956521739131}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.34782608695652173}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.08695652173913043}, 'label/mean': {'doubleValue': 0.1982758641242981}, 'post_export_metrics/example_count': {'doubleValue': 116.0}, 'precision': {'doubleValue': 0.23333333432674408}, 'prediction/mean': {'doubleValue': 0.4908219575881958}, 'recall': {'doubleValue': 0.6086956262588501} }, (('sexual_orientation', 'heterosexual'),): {'accuracy': {'doubleValue': 0.5203251838684082}, 'accuracy_baseline': {'doubleValue': 0.7601625919342041}, 'auc': {'doubleValue': 0.6672822833061218}, 'auc_precision_recall': {'doubleValue': 0.4065391719341278}, 'average_loss': {'doubleValue': 0.8273133039474487}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7272727272727273}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7062937062937062}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.655367231638418}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4473684210526316}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0847457627118644}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.288135593220339}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8220338983050848}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.10416666666666667}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.1650485436893204}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.18095238095238095}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.21365638766519823}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5401069518716578}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.31016042780748665}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.045454545454545456}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.1951219512195122}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4186991869918699}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6402439024390244}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9227642276422764}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8048780487804879}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5813008130081301}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.3597560975609756}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07723577235772358}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.45989304812834225}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.6898395721925134}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9545454545454546}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9152542372881356}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.711864406779661}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.17796610169491525}, 'label/mean': {'doubleValue': 0.2398373931646347}, 'post_export_metrics/example_count': {'doubleValue': 492.0}, 'precision': {'doubleValue': 0.2937062978744507}, 'prediction/mean': {'doubleValue': 0.5578703880310059}, 'recall': {'doubleValue': 0.7118644118309021} }, (('sexual_orientation', 'homosexual_gay_or_lesbian'),): {'accuracy': {'doubleValue': 0.5851936340332031}, 'accuracy_baseline': {'doubleValue': 0.7182232141494751}, 'auc': {'doubleValue': 0.7057511806488037}, 'auc_precision_recall': {'doubleValue': 0.469566285610199}, 'average_loss': {'doubleValue': 0.7369641661643982}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7107050831576481}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.6717557251908397}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6172690763052209}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5427319211102994}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4092664092664093}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0016168148746968471}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.06143896523848019}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.22958771220695232}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4939369442198868}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8763136620856912}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.01652892561983471}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.08909730363423213}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.14947368421052631}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.20225091029460443}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.2624061970467199}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9622581668252458}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7535680304471931}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.4874722486520774}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.2356485886457342}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.03361877576910879}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.0275626423690205}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.19430523917995443}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4328018223234624}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6881548974943053}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.941002277904328}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9724373576309795}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8056947608200455}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5671981776765376}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.31184510250569475}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.05899772209567198}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0377418331747542}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.24643196955280686}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5125277513479226}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7643514113542658}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9663812242308912}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9983831851253031}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9385610347615198}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7704122877930477}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5060630557801131}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.12368633791430882}, 'label/mean': {'doubleValue': 0.2817767560482025}, 'post_export_metrics/example_count': {'doubleValue': 4390.0}, 'precision': {'doubleValue': 0.3827309310436249}, 'prediction/mean': {'doubleValue': 0.5428739786148071}, 'recall': {'doubleValue': 0.770412266254425} }, (('sexual_orientation', 'other_sexual_orientation'),): {'accuracy': {'doubleValue': 0.6000000238418579}, 'accuracy_baseline': {'doubleValue': 0.800000011920929}, 'auc': {'doubleValue': 1.0}, 'auc_precision_recall': {'doubleValue': 1.0}, 'average_loss': {'doubleValue': 0.7521011829376221}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.8}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.75}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6666666666666666}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 'NaN'}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.75}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.25}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.2}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.8}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.6}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.4}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.2}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.25}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.75}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 1.0}, 'label/mean': {'doubleValue': 0.20000000298023224}, 'post_export_metrics/example_count': {'doubleValue': 5.0}, 'precision': {'doubleValue': 0.3333333432674408}, 'prediction/mean': {'doubleValue': 0.6101843118667603}, 'recall': {'doubleValue': 1.0} } }