将 tf.summary 用法迁移到 TF 2.x

注：本文档面向已经熟悉 TensorFlow 1.x TensorBoard 并希望将大型 TensorFlow 代码库从 TensorFlow 1.x 迁移至 2.x 的用户。如果您是 TensorBoard 的新用户，另请参阅使用入门文档。如果您使用 tf.keras，那么可能无需执行任何操作即可升级到 TensorFlow 2.x。

import tensorflow as tf

2023-11-07 17:31:45.728313: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-07 17:31:45.728361: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-07 17:31:45.729848: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

TensorFlow 2.x 包含对 tf.summary API（用于写入摘要数据以在 TensorBoard 中进行呈现）的重大变更。

变更

将 tf.summary API 视为两个子 API 非常实用：

一组用于记录各个摘要（summary.scalar()、summary.histogram()、summary.image()、summary.audio() 和 summary.text()）的运算，从您的模型代码内嵌调用。
写入逻辑，用于收集各个摘要并将其写入到特殊格式化的日志文件中（TensorBoard 随后会读取该文件以生成可视化效果）。

在 TF 1.x 中

上述二者必须手动关联在一起，方式是通过 Session.run() 获取摘要运算输出，并调用 FileWriter.add_summary(output, step)。v1.summary.merge_all() 运算通过使用计算图集合汇总所有摘要运算输出使这个操作更轻松，但是这种方式对 Eager Execution 和控制流的效果仍不尽人意，因此特别不适用于 TF 2.x。

在 TF 2.X 中

上述二者紧密集成。现在，单独的 tf.summary 运算在执行时可立即写入其数据。在您的模型代码中使用 API 的方式与以往类似，但是现在对 Eager Execution 更加友好，同时也保留了与计算图模式的兼容性。两个子 API 的集成意味着 summary.FileWriter 现已成为 TensorFlow 执行上下文的一部分，可直接通过 tf.summary 运算访问，因此配置写入器将是主要的差异。

Eager Execution 的示例用法（TF 2.x 中默认）：

writer = tf.summary.create_file_writer("/tmp/mylogs/eager")

with writer.as_default():
  for step in range(100):
    # other model code would go here
    tf.summary.scalar("my_metric", 0.5, step=step)
    writer.flush()

ls /tmp/mylogs/eager

events.out.tfevents.1699378309.kokoro-gcp-ubuntu-prod-26365401.28722.0.v2

tf.function 计算图执行的示例用法：

writer = tf.summary.create_file_writer("/tmp/mylogs/tf_function")

@tf.function
def my_func(step):
  with writer.as_default():
    # other model code would go here
    tf.summary.scalar("my_metric", 0.5, step=step)

for step in tf.range(100, dtype=tf.int64):
  my_func(step)
  writer.flush()

ls /tmp/mylogs/tf_function

events.out.tfevents.1699378310.kokoro-gcp-ubuntu-prod-26365401.28722.1.v2

旧 TF 1.x 计算图执行的示例用法：

g = tf.compat.v1.Graph()
with g.as_default():
  step = tf.Variable(0, dtype=tf.int64)
  step_update = step.assign_add(1)
  writer = tf.summary.create_file_writer("/tmp/mylogs/session")
  with writer.as_default():
    tf.summary.scalar("my_metric", 0.5, step=step)
  all_summary_ops = tf.compat.v1.summary.all_v2_summary_ops()
  writer_flush = writer.flush()


with tf.compat.v1.Session(graph=g) as sess:
  sess.run([writer.init(), step.initializer])

  for i in range(100):
    sess.run(all_summary_ops)
    sess.run(step_update)
    sess.run(writer_flush)

ls /tmp/mylogs/session

events.out.tfevents.1699378311.kokoro-gcp-ubuntu-prod-26365401.28722.2.v2

转换您的代码

将现有的 tf.summary 用法转换至 TF 2.x API 无法实现可靠的自动化，因此 tf_upgrade_v2 脚本只是将其全部重写为 tf.compat.v1.summary 并且不会自动启用 TF 2.x 行为。

部分迁移

为使仍严重依赖于 TF 1.x 摘要 API 日志运算（如 tf.compat.v1.summary.scalar()）的模型代码的用户更容易迁移到 TF 2.x，可以首先仅迁移编写器 API，这样可以稍后再完全迁移模型代码中的各个 TF 1.x 摘要运算。

为了支持这种迁移方式，tf.compat.v1.summary 将在以下条件下自动转发到其 TF 2.x 等效项：

最外层上下文为 Eager 模式
已设置默认 TF 2.x 摘要编写器
已为编写器设置非空步骤值（使用 tf.summary.SummaryWriter.as_default、tf.summary.experimental.set_step 或 tf.compat.v1.train.create_global_step）

请注意，调用 TF 2.x 摘要实现时，返回值将为空字节串张量，以避免重复编写摘要。此外，输入参数转发是尽力而为的，并非所有参数都将被保留（例如，将支持 family 参数，而 collections 将被移除）。

在 tf.compat.v1.summary.scalar 中调用 tf.summary.scalar 行为的示例：

# Enable eager execution.
tf.compat.v1.enable_v2_behavior()

# A default TF 2.x summary writer is available.
writer = tf.summary.create_file_writer("/tmp/mylogs/enable_v2_in_v1")
# A step is set for the writer.
with writer.as_default(step=0):
  # Below invokes `tf.summary.scalar`, and the return value is an empty bytestring.
  tf.compat.v1.summary.scalar('float', tf.constant(1.0), family="family")

完全迁移

要完全迁移到 TF 2.x，您需要按如下方式调整您的代码：

必须存在通过 .as_default() 设置的默认写入器才能使用摘要运算
- 这意味着在 Eager Execution 模式下执行运算或在计算图构造中使用运算
- 如果没有默认写入器，摘要运算将变为静默空运算
- 默认写入器（尚）不跨 @tf.function 执行边界传播（仅在跟踪函数时对其进行检测），所以最佳做法是在函数体中调用 writer.as_default()，并确保在使用 @tf.function 时，写入器对象始终存在
必须通过 step 参数将“步骤”值传入每个运算
- TensorBoard 需要步骤值以将数据呈现为时间序列
- 由于 TF 1.x 中的全局步骤已被移除，因此需要执行显式传递，以确保每个运算都知道要读取的所需步骤变量
- 为了减少样板，对注册默认步骤值的实验性支持通过 tf.summary.experimental.set_step() 提供，但这是临时功能，如有更改，恕不另行通知
各个摘要运算的函数签名已更改
- 现在，返回值为布尔值（指示是否实际写入了摘要）
- 第二个参数名称（如果使用）已从 tensor 更改为 data
- collections 参数已被移除；集合仅适用于 TF 1.x
- family 参数已被移除；仅使用 tf.name_scope()
[仅针对旧计算图模式/会话执行用户]
- 首先使用 v1.Session.run(writer.init()) 初始化写入器
- 使用 v1.summary.all_v2_summary_ops() 获取当前计算图的所有 TF 2.x 摘要运算，例如通过 Session.run() 执行它们
- 使用 v1.Session.run(writer.flush()) 刷新写入器，并以同样方式使用 close()

如果您的 TF 1.x 代码已改用 tf.contrib.summary API，因其与 TF 2.x API 更加相似，tf_upgrade_v2 脚本将能够自动执行大多数迁移步骤（并针对无法完全迁移的任何用法发出警告或错误）。在大多数情况下，它只是将 API 调用重写为 tf.compat.v2.summary；如果只需要与 TF 2.x 兼容，那么您可以删除 compat.v2 并将其作为 tf.summary 引用。

其他提示

除上述重要内容以外，一些辅助方面也进行了更改：

条件记录（例如“每 100 个步骤记录一次”）有所更新
- 要控制运算和相关代码，请将其包装在常规 if 语句（可在 Eager 模式下运行，以及通过 AutoGraph 在 @tf.function 中使用）或 tf.cond 中
- 要仅控制摘要，请使用新的 tf.summary.record_if() 上下文管理器，并将其传递给您选择的布尔条件
- 以下内容替换了 TF 1.x 模式：
```
if condition:
  writer.add_summary()
```
不直接编写 tf.compat.v1.Graph - 改为使用跟踪函数
- TF 2.x 中的计算图执行使用 @tf.function，而非显式计算图
- 在 TF 2.x 中，使用新的跟踪样式 API tf.summary.trace_on() 和 tf.summary.trace_export() 记录执行的函数计算图
不再使用 tf.summary.FileWriterCache 按 logdir 缓存全局写入器
- 用户应实现自己的写入器对象缓存/共享方案，或者使用独立的写入器（TensorBoard 正在实现对后者的支持）
事件文件的二进制表示已更改
- TensorBoard 1.x 已支持新格式；此项变更仅对从事件文件手动解析摘要数据的用户存在影响
- 摘要数据现在以张量字节形式存储；您可以使用 tf.make_ndarray(event.summary.value[0].tensor) 将其转换为 Numpy