หน้านี้ได้รับการแปลโดย Cloud Translation API

ตัวอย่างการเพิ่มประสิทธิภาพที่มีข้อจำกัด TensorFlow โดยใช้ชุดข้อมูล CelebA

ดูบน TensorFlow.org

ทำงานใน Google Colab

ดูบน GitHub

ดาวน์โหลดโน๊ตบุ๊ค

สมุดบันทึกนี้สาธิตวิธีง่ายๆ ในการสร้างและเพิ่มประสิทธิภาพปัญหาที่มีข้อจำกัดโดยใช้ไลบรารี TFCO วิธีการนี้จะมีประโยชน์ในการปรับปรุงรูปแบบเมื่อเราพบว่าพวกเขากำลังทำงานได้ไม่ดีเท่ากันทั่วทั้งชิ้นที่แตกต่างกันของข้อมูลของเราซึ่งเราสามารถระบุโดยใช้ ความเป็นธรรมชี้วัด หลักการ AI ที่สองของ Google ระบุว่าเทคโนโลยีของเราควรหลีกเลี่ยงการสร้างหรือเสริมอคติที่ไม่เป็นธรรม และเราเชื่อว่าเทคนิคนี้สามารถช่วยปรับปรุงความเป็นธรรมของแบบจำลองได้ในบางสถานการณ์ โดยเฉพาะอย่างยิ่ง สมุดบันทึกนี้จะ:

รถไฟที่เรียบง่ายรูปแบบเครือข่ายประสาทข้อ จำกัด ในการตรวจสอบรอยยิ้มของบุคคลในภาพโดยใช้ tf.keras และแอตทริบิวต์ขนาดใหญ่ CelebFaces ( CelebA ) ชุดข้อมูล
ประเมินประสิทธิภาพของแบบจำลองเทียบกับตัวชี้วัดความเป็นธรรมที่ใช้กันทั่วไปในกลุ่มอายุโดยใช้ตัวชี้วัดความเป็นธรรม
ตั้งค่าปัญหาการเพิ่มประสิทธิภาพที่มีข้อจำกัดอย่างง่ายเพื่อให้ได้ประสิทธิภาพที่ยุติธรรมยิ่งขึ้นในกลุ่มอายุ
ฝึกรูปแบบ จำกัด ในขณะนี้และประเมินผลการทำงานอีกครั้งเพื่อให้มั่นใจว่าตัวชี้วัดความเป็นธรรมของเราได้รับการแต่งตั้งได้ดีขึ้น

ปรับปรุงล่าสุด: 3/11 ก.พ. 2020

การติดตั้ง

โน๊ตบุ๊คนี้ถูกสร้างขึ้นใน Colaboratory เชื่อมต่อกับแบ็กเอนด์หลาม 3 Google Compute Engine หากคุณต้องการโฮสต์สมุดบันทึกนี้ในสภาพแวดล้อมที่แตกต่างกัน คุณไม่ควรประสบปัญหาสำคัญใดๆ หากคุณรวมแพ็คเกจที่จำเป็นทั้งหมดไว้ในเซลล์ด้านล่าง

โปรดทราบว่าในครั้งแรกที่คุณเรียกใช้การติดตั้ง pip คุณอาจถูกขอให้รีสตาร์ทรันไทม์เนื่องจากแพ็คเกจล้าสมัยที่ติดตั้งไว้ล่วงหน้า เมื่อคุณทำเช่นนั้น แพ็คเกจที่ถูกต้องจะถูกใช้

Pip ติดตั้ง

!pip install -q -U pip==20.2

!pip install git+https://github.com/google-research/tensorflow_constrained_optimization
!pip install -q tensorflow-datasets tensorflow
!pip install fairness-indicators \
  "absl-py==0.12.0" \
  "apache-beam<3,>=2.34" \
  "avro-python3==1.9.1" \
  "pyzmq==17.0.0"

โปรดทราบว่าขึ้นอยู่กับเมื่อคุณเรียกใช้เซลล์ด้านล่าง คุณอาจได้รับคำเตือนเกี่ยวกับเวอร์ชันเริ่มต้นของ TensorFlow ใน Colab ที่เปลี่ยนเป็น TensorFlow 2.X ในไม่ช้า คุณสามารถเพิกเฉยต่อคำเตือนนั้นได้อย่างปลอดภัย เนื่องจากสมุดบันทึกนี้ได้รับการออกแบบมาให้เข้ากันได้กับ TensorFlow 1.X และ 2.X

นำเข้าโมดูล

import os
import sys
import tempfile
import urllib

import tensorflow as tf
from tensorflow import keras

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

import numpy as np

import tensorflow_constrained_optimization as tfco

from tensorflow_metadata.proto.v0 import schema_pb2
from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record

นอกจากนี้ เรายังเพิ่มการนำเข้าเฉพาะเจาะจงสำหรับตัวบ่งชี้ความเป็นธรรม ซึ่งเราจะใช้ในการประเมินและแสดงภาพประสิทธิภาพของแบบจำลอง

ดัชนีความเป็นธรรม การนำเข้าที่เกี่ยวข้อง

import tensorflow_model_analysis as tfma
import fairness_indicators as fi
from google.protobuf import text_format
import apache_beam as beam

แม้ว่า TFCO จะเข้ากันได้กับความกระตือรือร้นและการใช้กราฟ แต่โน้ตบุ๊กนี้จะถือว่าการเรียกใช้งานแบบกระตือรือร้นนั้นเปิดใช้งานตามค่าเริ่มต้นเช่นเดียวกับใน TensorFlow 2.x เพื่อให้แน่ใจว่าไม่มีอะไรเสียหาย การดำเนินการอย่างกระตือรือร้นจะเปิดใช้งานในเซลล์ด้านล่าง

เปิดใช้งาน Eager Execution และ Print Versions

if tf.__version__ < "2.0.0":
  tf.compat.v1.enable_eager_execution()
  print("Eager execution enabled.")
else:
  print("Eager execution enabled by default.")

print("TensorFlow " + tf.__version__)
print("TFMA " + tfma.VERSION_STRING)
print("TFDS " + tfds.version.__version__)
print("FI " + fi.version.__version__)

Eager execution enabled by default.
TensorFlow 2.8.0-rc0
TFMA 0.36.0
TFDS 4.4.0
FI 0.36.0

ชุดข้อมูล CelebA

CelebA เป็นใบหน้าขนาดใหญ่แอตทริบิวต์ชุดข้อมูลที่มีมากกว่า 200,000 ภาพที่มีชื่อเสียงแต่ละคนมี 40 คำอธิบายประกอบแอตทริบิวต์ (เช่นชนิดผม, เครื่องประดับแฟชั่น, ใบหน้า, ฯลฯ ) และ 5 สถานที่สถานที่สำคัญ (ดวงตาปากและจมูกตำแหน่ง) สำหรับรายละเอียดเพิ่มเติมดูที่ กระดาษ ได้รับอนุญาตจากเจ้าของที่เราได้เก็บไว้ชุดนี้บน Google Cloud Storage ได้และส่วนใหญ่เข้าถึงได้ผ่านทาง TensorFlow ชุดข้อมูล ( tfds )

ในสมุดบันทึกนี้:

แบบจำลองของเราจะพยายามที่จะแยกประเภทไม่ว่าจะเป็นเรื่องของภาพที่มีการยิ้มเป็นตัวแทนจากแอตทริบิวต์ "ยิ้ม" ^*
รูปภาพจะถูกปรับขนาดจาก 218x178 เป็น 28x28 เพื่อลดเวลาดำเนินการและหน่วยความจำเมื่อฝึก
ประสิทธิภาพของโมเดลของเราจะได้รับการประเมินตามกลุ่มอายุ โดยใช้แอตทริบิวต์ไบนารี "Young" เราจะเรียกสิ่งนี้ว่า "กลุ่มอายุ" ในสมุดบันทึกนี้

^* ในขณะที่มีข้อมูลน้อยมีอยู่เกี่ยวกับวิธีการติดฉลากสำหรับชุดนี้เราจะสมมติว่าแอตทริบิวต์ "ยิ้ม" ถูกกำหนดโดยความยินดีที่แสดงออกชนิดหรือขบขันบนใบหน้าของเรื่อง สำหรับวัตถุประสงค์ของกรณีศึกษานี้ เราจะถือว่าป้ายกำกับเหล่านี้เป็นความจริงพื้นฐาน

gcs_base_dir = "gs://celeb_a_dataset/"
celeb_a_builder = tfds.builder("celeb_a", data_dir=gcs_base_dir, version='2.0.0')

celeb_a_builder.download_and_prepare()

num_test_shards_dict = {'0.3.0': 4, '2.0.0': 2} # Used because we download the test dataset separately
version = str(celeb_a_builder.info.version)
print('Celeb_A dataset version: %s' % version)

Celeb_A dataset version: 2.0.0

ทดสอบฟังก์ชันตัวช่วยชุดข้อมูล

local_root = tempfile.mkdtemp(prefix='test-data')
def local_test_filename_base():
  return local_root

def local_test_file_full_prefix():
  return os.path.join(local_test_filename_base(), "celeb_a-test.tfrecord")

def copy_test_files_to_local():
  filename_base = local_test_file_full_prefix()
  num_test_shards = num_test_shards_dict[version]
  for shard in range(num_test_shards):
    url = "https://storage.googleapis.com/celeb_a_dataset/celeb_a/%s/celeb_a-test.tfrecord-0000%s-of-0000%s" % (version, shard, num_test_shards)
    filename = "%s-0000%s-of-0000%s" % (filename_base, shard, num_test_shards)
    res = urllib.request.urlretrieve(url, filename)

คำเตือน

ก่อนที่จะก้าวไปข้างหน้า มีข้อควรพิจารณาหลายประการที่ควรคำนึงถึงในการใช้ CelebA:

แม้ว่าโดยหลักการแล้ว สมุดบันทึกนี้สามารถใช้ชุดข้อมูลของภาพใบหน้าใดก็ได้ แต่ CelebA ก็ได้รับเลือกเนื่องจากมีภาพที่เป็นสาธารณสมบัติของบุคคลสาธารณะ
คำอธิบายประกอบแอตทริบิวต์ทั้งหมดใน CelebA ได้รับการดำเนินการเป็นหมวดหมู่ไบนารี ตัวอย่างเช่น แอตทริบิวต์ "Young" (ตามที่กำหนดโดยผู้ติดป้ายกำกับชุดข้อมูล) จะแสดงเป็นปัจจุบันหรือไม่อยู่ในรูปภาพ
การจัดหมวดหมู่ของ CelebA ไม่ได้สะท้อนถึงคุณลักษณะที่หลากหลายของมนุษย์อย่างแท้จริง
สำหรับวัตถุประสงค์ของสมุดบันทึกนี้ คุณลักษณะที่มีแอตทริบิวต์ "Young" จะเรียกว่า "กลุ่มอายุ" โดยแอตทริบิวต์ "Young" ในรูปภาพจะมีป้ายกำกับว่าเป็นสมาชิกของกลุ่มอายุ "Young" และ การขาดแอตทริบิวต์ "Young" จะถูกระบุว่าเป็นสมาชิกของกลุ่มอายุ "Not Young" เหล่านี้เป็นสมมติฐานที่ทำข้อมูลนี้ไม่ได้กล่าวถึงใน กระดาษเดิม
ดังนั้น ประสิทธิภาพในรุ่นที่ฝึกในสมุดบันทึกนี้จึงเชื่อมโยงกับวิธีการทำงานและอธิบายคุณลักษณะโดยผู้เขียน CelebA
รุ่นนี้ไม่ควรถูกใช้เพื่อวัตถุประสงค์ในเชิงพาณิชย์เป็นที่จะละเมิด ข้อตกลงการวิจัยที่ไม่ใช่เชิงพาณิชย์ CelebA ของ

การตั้งค่าฟังก์ชันอินพุต

เซลล์ที่ตามมาจะช่วยปรับปรุงขั้นตอนอินพุตและแสดงภาพประสิทธิภาพ

ขั้นแรก เรากำหนดตัวแปรที่เกี่ยวข้องกับข้อมูลและกำหนดฟังก์ชันการประมวลผลล่วงหน้าที่จำเป็น

กำหนดตัวแปร

ATTR_KEY = "attributes"
IMAGE_KEY = "image"
LABEL_KEY = "Smiling"
GROUP_KEY = "Young"
IMAGE_SIZE = 28

กำหนดฟังก์ชันการประมวลผลล่วงหน้า

def preprocess_input_dict(feat_dict):
  # Separate out the image and target variable from the feature dictionary.
  image = feat_dict[IMAGE_KEY]
  label = feat_dict[ATTR_KEY][LABEL_KEY]
  group = feat_dict[ATTR_KEY][GROUP_KEY]

  # Resize and normalize image.
  image = tf.cast(image, tf.float32)
  image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])
  image /= 255.0

  # Cast label and group to float32.
  label = tf.cast(label, tf.float32)
  group = tf.cast(group, tf.float32)

  feat_dict[IMAGE_KEY] = image
  feat_dict[ATTR_KEY][LABEL_KEY] = label
  feat_dict[ATTR_KEY][GROUP_KEY] = group

  return feat_dict

get_image_and_label = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY])
get_image_label_and_group = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY], feat_dict[ATTR_KEY][GROUP_KEY])

จากนั้น เราสร้างฟังก์ชันข้อมูลที่เราต้องการในส่วนที่เหลือของ colab

# Train data returning either 2 or 3 elements (the third element being the group)
def celeb_a_train_data_wo_group(batch_size):
  celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)
  return celeb_a_train_data.map(get_image_and_label)
def celeb_a_train_data_w_group(batch_size):
  celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)
  return celeb_a_train_data.map(get_image_label_and_group)

# Test data for the overall evaluation
celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group)
# Copy test data locally to be able to read it into tfma
copy_test_files_to_local()

สร้าง DNN Model อย่างง่าย

เพราะสมุดบันทึกนี้มุ่งเน้นไปที่ TFCO เราจะประกอบการที่ง่ายและไม่มีข้อ จำกัด tf.keras.Sequential รุ่น

เราอาจสามารถปรับปรุงประสิทธิภาพของโมเดลได้อย่างมากโดยเพิ่มความซับซ้อนบางอย่าง (เช่น เลเยอร์ที่เชื่อมต่อกันหนาแน่นมากขึ้น สำรวจฟังก์ชันการเปิดใช้งานต่างๆ เพิ่มขนาดภาพ) แต่นั่นอาจทำให้เสียสมาธิไปจากเป้าหมายในการแสดงให้เห็นว่าการใช้ไลบรารี TFCO นั้นง่ายเพียงใด เมื่อทำงานกับ Keras ด้วยเหตุผลดังกล่าว โมเดลนี้จึงจะดูเรียบง่าย แต่ขอแนะนำให้คุณสำรวจพื้นที่นี้

def create_model():
  # For this notebook, accuracy will be used to evaluate performance.
  METRICS = [
    tf.keras.metrics.BinaryAccuracy(name='accuracy')
  ]

  # The model consists of:
  # 1. An input layer that represents the 28x28x3 image flatten.
  # 2. A fully connected layer with 64 units activated by a ReLU function.
  # 3. A single-unit readout layer to output real-scores instead of probabilities.
  model = keras.Sequential([
      keras.layers.Flatten(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name='image'),
      keras.layers.Dense(64, activation='relu'),
      keras.layers.Dense(1, activation=None)
  ])

  # TFCO by default uses hinge loss — and that will also be used in the model.
  model.compile(
      optimizer=tf.keras.optimizers.Adam(0.001),
      loss='hinge',
      metrics=METRICS)
  return model

เรายังกำหนดฟังก์ชันเพื่อตั้งค่าเมล็ดพันธุ์เพื่อให้แน่ใจว่าได้ผลลัพธ์ที่ทำซ้ำได้ โปรดทราบว่า colab นี้มีขึ้นเพื่อเป็นเครื่องมือทางการศึกษาและไม่มีความเสถียรของไปป์ไลน์การผลิตที่ปรับแต่งอย่างประณีต การวิ่งโดยไม่ได้ตั้งต้นอาจนำไปสู่ผลลัพธ์ที่หลากหลาย

def set_seeds():
  np.random.seed(121212)
  tf.compat.v1.set_random_seed(212121)

ฟังก์ชั่นตัวช่วยตัวบ่งชี้ความเป็นธรรม

ก่อนการฝึกโมเดลของเรา เรากำหนดฟังก์ชันตัวช่วยจำนวนหนึ่งที่จะช่วยให้เราประเมินประสิทธิภาพของโมเดลผ่านตัวบ่งชี้ความเป็นธรรมได้

ขั้นแรก เราสร้างฟังก์ชันตัวช่วยเพื่อบันทึกโมเดลของเราเมื่อเราฝึก

def save_model(model, subdir):
  base_dir = tempfile.mkdtemp(prefix='saved_models')
  model_location = os.path.join(base_dir, subdir)
  model.save(model_location, save_format='tf')
  return model_location

ต่อไป เรากำหนดฟังก์ชันที่ใช้ในการประมวลผลข้อมูลล่วงหน้าเพื่อให้ส่งต่อไปยัง TFMA ได้อย่างถูกต้อง

ฟังก์ชันการประมวลผลข้อมูลล่วงหน้าสำหรับ

def tfds_filepattern_for_split(dataset_name, split):
  return f"{local_test_file_full_prefix()}*"

class PreprocessCelebA(object):
  """Class that deserializes, decodes and applies additional preprocessing for CelebA input."""
  def __init__(self, dataset_name):
    builder = tfds.builder(dataset_name)
    self.features = builder.info.features
    example_specs = self.features.get_serialized_info()
    self.parser = tfds.core.example_parser.ExampleParser(example_specs)

  def __call__(self, serialized_example):
    # Deserialize
    deserialized_example = self.parser.parse_example(serialized_example)
    # Decode
    decoded_example = self.features.decode_example(deserialized_example)
    # Additional preprocessing
    image = decoded_example[IMAGE_KEY]
    label = decoded_example[ATTR_KEY][LABEL_KEY]
    # Resize and scale image.
    image = tf.cast(image, tf.float32)
    image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])
    image /= 255.0
    image = tf.reshape(image, [-1])
    # Cast label and group to float32.
    label = tf.cast(label, tf.float32)

    group = decoded_example[ATTR_KEY][GROUP_KEY]

    output = tf.train.Example()
    output.features.feature[IMAGE_KEY].float_list.value.extend(image.numpy().tolist())
    output.features.feature[LABEL_KEY].float_list.value.append(label.numpy())
    output.features.feature[GROUP_KEY].bytes_list.value.append(b"Young" if group.numpy() else b'Not Young')
    return output.SerializeToString()

def tfds_as_pcollection(beam_pipeline, dataset_name, split):
  return (
      beam_pipeline
   | 'Read records' >> beam.io.ReadFromTFRecord(tfds_filepattern_for_split(dataset_name, split))
   | 'Preprocess' >> beam.Map(PreprocessCelebA(dataset_name))
  )

สุดท้าย เรากำหนดฟังก์ชันที่ประเมินผลลัพธ์ใน TFMA

def get_eval_results(model_location, eval_subdir):
  base_dir = tempfile.mkdtemp(prefix='saved_eval_results')
  tfma_eval_result_path = os.path.join(base_dir, eval_subdir)

  eval_config_pbtxt = """
        model_specs {
          label_key: "%s"
        }
        metrics_specs {
          metrics {
            class_name: "FairnessIndicators"
            config: '{ "thresholds": [0.22, 0.5, 0.75] }'
          }
          metrics {
            class_name: "ExampleCount"
          }
        }
        slicing_specs {}
        slicing_specs { feature_keys: "%s" }
        options {
          compute_confidence_intervals { value: False }
          disabled_outputs{values: "analysis"}
        }
      """ % (LABEL_KEY, GROUP_KEY)

  eval_config = text_format.Parse(eval_config_pbtxt, tfma.EvalConfig())

  eval_shared_model = tfma.default_eval_shared_model(
        eval_saved_model_path=model_location, tags=[tf.saved_model.SERVING])

  schema_pbtxt = """
        tensor_representation_group {
          key: ""
          value {
            tensor_representation {
              key: "%s"
              value {
                dense_tensor {
                  column_name: "%s"
                  shape {
                    dim { size: 28 }
                    dim { size: 28 }
                    dim { size: 3 }
                  }
                }
              }
            }
          }
        }
        feature {
          name: "%s"
          type: FLOAT
        }
        feature {
          name: "%s"
          type: FLOAT
        }
        feature {
          name: "%s"
          type: BYTES
        }
        """ % (IMAGE_KEY, IMAGE_KEY, IMAGE_KEY, LABEL_KEY, GROUP_KEY)
  schema = text_format.Parse(schema_pbtxt, schema_pb2.Schema())
  coder = tf_example_record.TFExampleBeamRecord(
      physical_format='inmem', schema=schema,
      raw_record_column_name=tfma.ARROW_INPUT_COLUMN)
  tensor_adapter_config = tensor_adapter.TensorAdapterConfig(
    arrow_schema=coder.ArrowSchema(),
    tensor_representations=coder.TensorRepresentations())
  # Run the fairness evaluation.
  with beam.Pipeline() as pipeline:
    _ = (
          tfds_as_pcollection(pipeline, 'celeb_a', 'test')
          | 'ExamplesToRecordBatch' >> coder.BeamSource()
          | 'ExtractEvaluateAndWriteResults' >>
          tfma.ExtractEvaluateAndWriteResults(
              eval_config=eval_config,
              eval_shared_model=eval_shared_model,
              output_path=tfma_eval_result_path,
              tensor_adapter_config=tensor_adapter_config)
    )
  return tfma.load_eval_result(output_path=tfma_eval_result_path)

ฝึกฝนและประเมินแบบจำลองที่ไม่มีข้อจำกัด

ด้วยการกำหนดโมเดลและไปป์ไลน์อินพุต ตอนนี้เราพร้อมที่จะฝึกโมเดลของเราแล้ว เพื่อลดระยะเวลาในการดำเนินการและหน่วยความจำ เราจะฝึกโมเดลโดยแบ่งข้อมูลออกเป็นแบทช์เล็กๆ โดยทำซ้ำเพียงไม่กี่ครั้ง

โปรดทราบว่าการทำงานในสมุดบันทึกนี้ TensorFlow <2.0.0 อาจส่งผลให้มีการเลิกใช้คำเตือนสำหรับ np.where ละเว้นคำเตือนนี้เป็น TensorFlow อยู่ใน 2.X โดยใช้ tf.where ในสถานที่ของ np.where

BATCH_SIZE = 32

# Set seeds to get reproducible results
set_seeds()

model_unconstrained = create_model()
model_unconstrained.fit(celeb_a_train_data_wo_group(BATCH_SIZE), epochs=5, steps_per_epoch=1000)

Epoch 1/5
1000/1000 [==============================] - 12s 6ms/step - loss: 0.5038 - accuracy: 0.7733
Epoch 2/5
1000/1000 [==============================] - 7s 7ms/step - loss: 0.3800 - accuracy: 0.8301
Epoch 3/5
1000/1000 [==============================] - 6s 6ms/step - loss: 0.3598 - accuracy: 0.8427
Epoch 4/5
1000/1000 [==============================] - 25s 25ms/step - loss: 0.3435 - accuracy: 0.8474
Epoch 5/5
1000/1000 [==============================] - 5s 5ms/step - loss: 0.3402 - accuracy: 0.8479
<keras.callbacks.History at 0x7f0f5c476350>

การประเมินแบบจำลองจากข้อมูลการทดสอบควรส่งผลให้ได้คะแนนความแม่นยำสุดท้ายเพียง 85% ไม่เลวสำหรับรุ่นธรรมดาที่ไม่มีการปรับแต่งแบบละเอียด

print('Overall Results, Unconstrained')
celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group)
results = model_unconstrained.evaluate(celeb_a_test_data)

Overall Results, Unconstrained
19962/19962 [==============================] - 50s 2ms/step - loss: 0.2125 - accuracy: 0.8636

อย่างไรก็ตาม การประเมินประสิทธิภาพในกลุ่มอายุอาจเผยให้เห็นข้อบกพร่องบางประการ

เพื่อสำรวจเพิ่มเติม เราประเมินแบบจำลองด้วยตัวบ่งชี้ความเป็นธรรม (ผ่าน TFMA) โดยเฉพาะอย่างยิ่ง เราสนใจที่จะดูว่ามีช่องว่างในประสิทธิภาพระหว่างหมวดหมู่ "เด็ก" และ "ไม่เด็ก" อย่างมีนัยสำคัญหรือไม่ เมื่อประเมินจากอัตราการบวกที่ผิดพลาด

ข้อผิดพลาดเชิงบวกที่ผิดพลาดเกิดขึ้นเมื่อแบบจำลองทำนายคลาสบวกอย่างไม่ถูกต้อง ในบริบทนี้ ผลลัพธ์เชิงบวกที่ผิดพลาดจะเกิดขึ้นเมื่อความจริงพื้นฐานคือภาพของคนดังที่ 'ไม่ยิ้ม' และนางแบบคาดการณ์ว่า 'ยิ้ม' โดยการขยาย อัตราบวกลวง ซึ่งใช้ในการแสดงภาพด้านบน เป็นการวัดความแม่นยำสำหรับการทดสอบ แม้ว่าสิ่งนี้จะเป็นข้อผิดพลาดทางโลกที่ค่อนข้างปกติในบริบทนี้ แต่ข้อผิดพลาดเชิงบวกที่ผิดพลาดในบางครั้งอาจทำให้เกิดพฤติกรรมที่เป็นปัญหามากขึ้น ตัวอย่างเช่น ข้อผิดพลาดเชิงบวกที่ผิดพลาดในตัวแยกประเภทสแปมอาจทำให้ผู้ใช้พลาดอีเมลสำคัญ

model_location = save_model(model_unconstrained, 'model_export_unconstrained')
eval_results_unconstrained = get_eval_results(model_location, 'eval_results_unconstrained')

2022-01-07 18:46:05.881112: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/saved_modelswhxcqdry/model_export_unconstrained/assets
INFO:tensorflow:Assets written to: /tmp/saved_modelswhxcqdry/model_export_unconstrained/assets
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:107: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:107: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

ดังที่ได้กล่าวไว้ข้างต้น เรากำลังเพ่งความสนใจไปที่อัตราผลบวกลวง ตัวบ่งชี้ความเป็นธรรมเวอร์ชันปัจจุบัน (0.1.2) จะเลือกอัตราการติดลบที่เป็นเท็จโดยค่าเริ่มต้น หลังจากรันบรรทัดด้านล่างแล้ว ให้ยกเลิกการเลือก false_negative_rate และเลือก false_positive_rate เพื่อดูเมตริกที่เราสนใจ

tfma.addons.fairness.view.widget_view.render_fairness_indicator(eval_results_unconstrained)

FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Young', 'slice': 'Young:Young', 'metrics': {'example_c…

ขณะที่ผลการแสดงดังกล่าวข้างต้นเราจะเห็นช่องว่างที่ไม่ได้สัดส่วนระหว่าง "หนุ่ม" และ "ไม่หนุ่ม" หมวดหมู่

นี่คือจุดที่ TFCO สามารถช่วยได้โดยจำกัดอัตราการบวกลวงให้อยู่ในเกณฑ์ที่ยอมรับได้มากขึ้น

การติดตั้งโมเดลที่มีข้อจำกัด

ในฐานะที่เป็นบันทึกไว้ใน ห้องสมุดของ TFCO มีหลายผู้ช่วยที่จะทำให้มันง่ายที่จะ จำกัด ปัญหา:

tfco.rate_context() - นี่คือสิ่งที่จะนำมาใช้ในการสร้างข้อ จำกัด สำหรับแต่ละประเภทกลุ่มอายุ
tfco.RateMinimizationProblem() - การแสดงออกอัตราการลดลงที่นี่จะเป็นเท็จอัตราในเชิงบวกต่อกลุ่มอายุ กล่าวอีกนัยหนึ่ง ประสิทธิภาพในขณะนี้จะได้รับการประเมินโดยพิจารณาจากความแตกต่างระหว่างอัตราการบวกลวงของกลุ่มอายุและของชุดข้อมูลโดยรวม สำหรับการสาธิตนี้ อัตราบวกลวงที่น้อยกว่าหรือเท่ากับ 5% จะถูกตั้งค่าเป็นข้อจำกัด
tfco.ProxyLagrangianOptimizerV2() - นี่คือผู้ช่วยที่จริงจะแก้ปัญหาข้อ จำกัด อัตรา

เซลล์ด้านล่างจะเรียกผู้ช่วยเหล่านี้ให้ตั้งค่าการฝึกแบบจำลองโดยมีข้อจำกัดด้านความเป็นธรรม

# The batch size is needed to create the input, labels and group tensors.
# These tensors are initialized with all 0's. They will eventually be assigned
# the batch content to them. A large batch size is chosen so that there are
# enough number of "Young" and "Not Young" examples in each batch.
set_seeds()
model_constrained = create_model()
BATCH_SIZE = 32

# Create input tensor.
input_tensor = tf.Variable(
    np.zeros((BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, 3), dtype="float32"),
    name="input")

# Create labels and group tensors (assuming both labels and groups are binary).
labels_tensor = tf.Variable(
    np.zeros(BATCH_SIZE, dtype="float32"), name="labels")
groups_tensor = tf.Variable(
    np.zeros(BATCH_SIZE, dtype="float32"), name="groups")

# Create a function that returns the applied 'model' to the input tensor
# and generates constrained predictions.
def predictions():
  return model_constrained(input_tensor)

# Create overall context and subsetted context.
# The subsetted context contains subset of examples where group attribute < 1
# (i.e. the subset of "Not Young" celebrity images).
# "groups_tensor < 1" is used instead of "groups_tensor == 0" as the former
# would be a comparison on the tensor value, while the latter would be a
# comparison on the Tensor object.
context = tfco.rate_context(predictions, labels=lambda:labels_tensor)
context_subset = context.subset(lambda:groups_tensor < 1)

# Setup list of constraints.
# In this notebook, the constraint will just be: FPR to less or equal to 5%.
constraints = [tfco.false_positive_rate(context_subset) <= 0.05]

# Setup rate minimization problem: minimize overall error rate s.t. constraints.
problem = tfco.RateMinimizationProblem(tfco.error_rate(context), constraints)

# Create constrained optimizer and obtain train_op.
# Separate optimizers are specified for the objective and constraints
optimizer = tfco.ProxyLagrangianOptimizerV2(
      optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
      constraint_optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
      num_constraints=problem.num_constraints)

# A list of all trainable variables is also needed to use TFCO.
var_list = (model_constrained.trainable_weights + list(problem.trainable_variables) +
            optimizer.trainable_variables())

ขณะนี้ โมเดลได้รับการตั้งค่าและพร้อมที่จะฝึกด้วยข้อจำกัดอัตราผลบวกที่ผิดพลาดในทุกกลุ่มอายุ

ตอนนี้เพราะซ้ำสุดท้ายของรุ่น จำกัด อาจไม่จำเป็นต้องรูปแบบที่มีประสิทธิภาพดีที่สุดในแง่ของข้อ จำกัด ที่กำหนดไว้ห้องสมุด TFCO มาพร้อมกับ tfco.find_best_candidate_index() ที่จะช่วยให้สามารถเลือกที่ดีที่สุดสำทับออกมาจากคนที่พบหลังจากที่แต่ละ ยุค. คิดว่า tfco.find_best_candidate_index() เป็นผู้แก้ปัญหาเพิ่มที่จัดอันดับแต่ละผลที่อยู่บนพื้นฐานของความถูกต้องและความเป็นธรรม จำกัด (ในกรณีนี้อัตราบวกปลอมข้ามกลุ่มอายุ) แยกส่วนที่เกี่ยวกับข้อมูลการฝึกอบรม ด้วยวิธีนี้ จึงสามารถค้นหาการประนีประนอมระหว่างความถูกต้องโดยรวมกับข้อจำกัดด้านความเป็นธรรมได้ดีขึ้น

เซลล์ต่อไปนี้จะเริ่มการฝึกด้วยข้อจำกัด ขณะเดียวกันก็ค้นหาแบบจำลองที่มีประสิทธิภาพดีที่สุดต่อการทำซ้ำ

# Obtain train set batches.

NUM_ITERATIONS = 100  # Number of training iterations.
SKIP_ITERATIONS = 10  # Print training stats once in this many iterations.

# Create temp directory for saving snapshots of models.
temp_directory = tempfile.mktemp()
os.mkdir(temp_directory)

# List of objective and constraints across iterations.
objective_list = []
violations_list = []

# Training iterations.
iteration_count = 0
for (image, label, group) in celeb_a_train_data_w_group(BATCH_SIZE):
  # Assign current batch to input, labels and groups tensors.
  input_tensor.assign(image)
  labels_tensor.assign(label)
  groups_tensor.assign(group)

  # Run gradient update.
  optimizer.minimize(problem, var_list=var_list)

  # Record objective and violations.
  objective = problem.objective()
  violations = problem.constraints()

  sys.stdout.write(
      "\r Iteration %d: Hinge Loss = %.3f, Max. Constraint Violation = %.3f"
      % (iteration_count + 1, objective, max(violations)))

  # Snapshot model once in SKIP_ITERATIONS iterations.
  if iteration_count % SKIP_ITERATIONS == 0:
    objective_list.append(objective)
    violations_list.append(violations)

    # Save snapshot of model weights.
    model_constrained.save_weights(
        temp_directory + "/celeb_a_constrained_" +
        str(iteration_count / SKIP_ITERATIONS) + ".h5")

  iteration_count += 1
  if iteration_count >= NUM_ITERATIONS:
    break

# Choose best model from recorded iterates and load that model.
best_index = tfco.find_best_candidate_index(
    np.array(objective_list), np.array(violations_list))

model_constrained.load_weights(
    temp_directory + "/celeb_a_constrained_" + str(best_index) + ".0.h5")

# Remove temp directory.
os.system("rm -r " + temp_directory)

Iteration 100: Hinge Loss = 0.614, Max. Constraint Violation = 0.268
0

หลังจากใช้ข้อจำกัดแล้ว เราจะประเมินผลลัพธ์อีกครั้งโดยใช้ตัวบ่งชี้ความเป็นธรรม

model_location = save_model(model_constrained, 'model_export_constrained')
eval_result_constrained = get_eval_results(model_location, 'eval_results_constrained')

INFO:tensorflow:Assets written to: /tmp/saved_modelsbztxt9fy/model_export_constrained/assets
INFO:tensorflow:Assets written to: /tmp/saved_modelsbztxt9fy/model_export_constrained/assets
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.

เช่นเดียวกับครั้งก่อน เราใช้ตัวบ่งชี้ความเป็นธรรม ให้ยกเลิกการเลือก false_negative_rate และเลือก false_positive_rate เพื่อดูเมตริกที่เราสนใจ

โปรดทราบว่าหากต้องการเปรียบเทียบโมเดลทั้งสองเวอร์ชันของเราอย่างเป็นธรรม สิ่งสำคัญคือต้องใช้เกณฑ์ที่กำหนดอัตราผลบวกลวงโดยรวมให้เท่ากันโดยประมาณ เพื่อให้แน่ใจว่าเรากำลังดูการเปลี่ยนแปลงจริง เมื่อเทียบกับการเปลี่ยนแปลงในแบบจำลองที่เทียบเท่ากับการย้ายขอบเขตธรณีประตู ในกรณีของเรา การเปรียบเทียบแบบจำลองที่ไม่มีข้อจำกัดที่ 0.5 และแบบจำลองที่มีข้อจำกัดที่ 0.22 ให้การเปรียบเทียบที่ยุติธรรมสำหรับแบบจำลอง

eval_results_dict = {
    'constrained': eval_result_constrained,
    'unconstrained': eval_results_unconstrained,
}
tfma.addons.fairness.view.widget_view.render_fairness_indicator(multi_eval_results=eval_results_dict)

FairnessIndicatorViewer(evalName='constrained', evalNameCompare='unconstrained', slicingMetrics=[{'sliceValue'…

ด้วยความสามารถของ TFCO ในการแสดงความต้องการที่ซับซ้อนมากขึ้นเป็นข้อจำกัดด้านอัตรา เราจึงช่วยให้โมเดลนี้บรรลุผลลัพธ์ที่ต้องการมากขึ้นโดยส่งผลกระทบเพียงเล็กน้อยต่อประสิทธิภาพโดยรวม แน่นอนว่ายังมีช่องว่างสำหรับการปรับปรุง แต่อย่างน้อย TFCO ก็สามารถหาแบบจำลองที่ใกล้เคียงกับการบรรลุข้อจำกัดและลดความเหลื่อมล้ำระหว่างกลุ่มต่างๆ ให้มากที่สุด