Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge


Enable tensor numerics checking in an eager/graph unified fashion.

The numerics checking mechanism will cause any TensorFlow eager execution or graph execution to error out as soon as an op's output tensor contains infinity or NaN.

This method is idempotent. Calling it multiple times has the same effect as calling it once.

This method takes effect only on the thread in which it is called.

When a op's float-type output tensor contains any Infinity or NaN, an tf.errors.InvalidArgumentError will be thrown, with an error message that reveals the following information:

  • The type of the op that generated the tensor with bad numerics.
  • Data type (dtype) of the tensor.
  • Shape of the tensor (to the extent known at the time of eager execution or graph construction).
  • Name of the containing graph (if available).
  • (Graph mode only): The stack trace of the intra-graph op's creation, with a stack-height limit and a path-length limit for visual clarity. The stack frames that belong to the user's code (as opposed to tensorflow's internal code) are highlighted with a text arrow ("->").
  • (Eager mode only): How many of the offending tensor's elements are Infinity and NaN, respectively.

Once enabled, the check-numerics mechanism can be disabled by using tf.debugging.disable_check_numerics().

Example usage:

  1. Catching infinity during the execution of a tf.function graph:

    import tensorflow as tf
    def square_log_x_plus_1(x):
      v = tf.math.log(x + 1)
      return tf.math.square(v)
    x = -1.0
    # When the following line runs, a function graph will be compiled
    # from the Python function `square_log_x_plus_1()`. Due to the
    # `enable_check_numerics()` call above, the graph will contain
    # numerics checking ops that will run during the function graph's
    # execution. The function call generates an -infinity when the Log
    # (logarithm) op operates on the output tensor of the Add op.
    # The program errors out at this line, printing an error message.
    y = square_log_x_plus_1(x)
    z = -y
  2. Catching NaN during eager execution:

    import numpy as np
    import tensorflow as tf
    x = np.array([[0.0, -1.0], [4.0, 3.0]])
    # The following line executes the Sqrt op eagerly. Due to the negative
    # element in the input array, a NaN is generated. Due to the
    # `enable_check_numerics()` call above, the program errors immediately
    # at this line, printing an error message.
    y = tf.math.sqrt(x)
    z = tf.matmul(y, y)

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
strategy = tf.distribute.TPUStrategy(resolver)
with strategy.scope():
  # ...

stack_height_limit Limit to the height of the printed stack trace. Applicable only to ops in tf.functions (graphs).
path_length_limit Limit to the file path included in the printed stack trace. Applicable only to ops in tf.functions (graphs).