Better performance with tf.function

View on TensorFlow.org View source on GitHub Download notebook

In TensorFlow 2, eager execution is turned on by default. The user interface is intuitive and flexible (running one-off operations is much easier and faster), but this can come at the expense of performance and deployability.

To get performant and portable models, use tf.function to make graphs out of your programs. However there are pitfalls to be wary of - tf.function is not a magical make-it-faster bullet!

This document will help you conceptualize what tf.function is doing under the hood, so that you can master its use.

The main takeaways and recommendations are:

  • Debug in Eager mode, then decorate with @tf.function.
  • Don't rely on Python side effects like object mutation or list appends.
  • tf.function works best with TensorFlow ops; NumPy and Python calls are converted to constants.

Setup

import tensorflow as tf

Define a helper function to demonstrate the kinds of errors you might encounter:

import traceback
import contextlib

# Some helper code to demonstrate the kinds of errors you might encounter.
@contextlib.contextmanager
def assert_raises(error_class):
  try:
    yield
  except error_class as e:
    print('Caught expected exception \n  {}:'.format(error_class))
    traceback.print_exc(limit=2)
  except Exception as e:
    raise e
  else:
    raise Exception('Expected {} to be raised but no error was raised!'.format(
        error_class))

Basics

A tf.function you define is just like a core TensorFlow operation: You can execute it eagerly; you can compute gradients; and so on.

@tf.function
def add(a, b):
  return a + b

add(tf.ones([2, 2]), tf.ones([2, 2]))  #  [[2., 2.], [2., 2.]]
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 2.],
       [2., 2.]], dtype=float32)>
v = tf.Variable(1.0)
with tf.GradientTape() as tape:
  result = add(v, 1.0)
tape.gradient(result, v)
<tf.Tensor: shape=(), dtype=float32, numpy=1.0>

You can use functions inside functions.

@tf.function
def dense_layer(x, w, b):
  return add(tf.matmul(x, w), b)

dense_layer(tf.ones([3, 2]), tf.ones([2, 2]), tf.ones([2]))
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[3., 3.],
       [3., 3.],
       [3., 3.]], dtype=float32)>

Functions can be faster than eager code, especially for graphs with many small ops. But for graphs with a few expensive ops (like convolutions), you may not see much speedup.

import timeit
conv_layer = tf.keras.layers.Conv2D(100, 3)

@tf.function
def conv_fn(image):
  return conv_layer(image)

image = tf.zeros([1, 200, 200, 100])
# warm up
conv_layer(image); conv_fn(image)
print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))
print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))
print("Note how there's not much difference in performance for convolutions")

Eager conv: 0.004822793000016645
Function conv: 0.0029242999999041785
Note how there's not much difference in performance for convolutions

Debugging

In general, debugging code is easier in Eager mode than inside a tf.function. You should ensure that your code executes error-free in Eager mode before decorating with tf.function. To assist in the debugging process, you can call tf.config.run_functions_eagerly(True) to globally disable and reenable tf.function.

When tracking down issues that only appear within tf.function, here are some tips:

  • Plain old Python print calls only execute during tracing, helping you track down when your functions get (re)traced.
  • tf.print calls will execute every time, and can help you track down intermediate values during execution.
  • tf.debugging.enable_check_numerics is an easy way to track down where NaNs and Inf are created.
  • pdb can help you understand what's going on during tracing. (Caveat: PDB will drop you into AutoGraph-transformed source code.)

Tracing and polymorphism

Python's dynamic typing means that you can call functions with a variety of argument types, and Python will do something different in each scenario.

On the other hand, TensorFlow graphs require static dtypes and shape dimensions. tf.function bridges this gap by retracing the function when necessary to generate the correct graphs. Most of the subtlety of tf.function usage stems from this retracing behavior.

You can call a function with arguments of different types to see what is happening.

# Functions are polymorphic

@tf.function
def double(a):
  print("Tracing with", a)
  return a + a

print(double(tf.constant(1)))
print()
print(double(tf.constant(1.1)))
print()
print(double(tf.constant("a")))
print()

Tracing with Tensor("a:0", shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)

Tracing with Tensor("a:0", shape=(), dtype=float32)
tf.Tensor(2.2, shape=(), dtype=float32)

Tracing with Tensor("a:0", shape=(), dtype=string)
tf.Tensor(b'aa', shape=(), dtype=string)


To control the tracing behavior, you can use the following techniques:

Create a new tf.function. Separate tf.function objects are guaranteed not to share traces.

def f():
  print('Tracing!')
  tf.print('Executing')

tf.function(f)()
tf.function(f)()
Tracing!
Executing
Tracing!
Executing

Use get_concrete_function method to get a specific trace.

print("Obtaining concrete trace")
double_strings = double.get_concrete_function(tf.TensorSpec(shape=None, dtype=tf.string))
print("Executing traced function")
print(double_strings(tf.constant("a")))
print(double_strings(a=tf.constant("b")))
print("Using a concrete trace with incompatible types will throw an error")
with assert_raises(tf.errors.InvalidArgumentError):
  double_strings(tf.constant(1))
Obtaining concrete trace
Tracing with Tensor("a:0", dtype=string)
Executing traced function
tf.Tensor(b'aa', shape=(), dtype=string)
tf.Tensor(b'bb', shape=(), dtype=string)
Using a concrete trace with incompatible types will throw an error
Caught expected exception 
  <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
    yield
  File "<ipython-input-10-5351d0a2eda2>", line 8, in <module>
    double_strings(tf.constant(1))
tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute __inference_double_183 as input #0(zero-based) was expected to be a string tensor but is a int32 tensor [Op:__inference_double_183]

Specify input_signature in tf.function to limit tracing.

@tf.function(input_signature=(tf.TensorSpec(shape=[None], dtype=tf.int32),))
def next_collatz(x):
  print("Tracing with", x)
  return tf.where(x % 2 == 0, x // 2, 3 * x + 1)

print(next_collatz(tf.constant([1, 2])))
# We specified a 1-D tensor in the input signature, so this should fail.
with assert_raises(ValueError):
  next_collatz(tf.constant([[1, 2], [3, 4]]))

Tracing with Tensor("x:0", shape=(None,), dtype=int32)
tf.Tensor([4 1], shape=(2,), dtype=int32)
Caught expected exception 
  <class 'ValueError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
    yield
  File "<ipython-input-11-9939c82c1507>", line 9, in <module>
    next_collatz(tf.constant([[1, 2], [3, 4]]))
ValueError: Python inputs incompatible with input_signature:
  inputs: (
    tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32))
  input_signature: (
    TensorSpec(shape=(None,), dtype=tf.int32, name=None))

When to retrace?

A polymorphic tf.function keeps a cache of concrete functions generated by tracing. The cache keys are effectively tuples of keys generated from the function args and kwargs. The key generated for a tf.Tensor argument is its number of dimensions and type. The key generated for a Python primitive is its value. For all other Python types, the keys are based on the object id() so that methods are traced independently for each instance of a class. In the future, TensorFlow may add more sophisticated cachi ng for Python objects that can be safely converted to tensors.

See Concrete functions

Python or Tensor args?

Often, Python arguments are used to control hyperparameters and graph constructions - for example, num_layers=10 or training=True or nonlinearity='relu'. So if the Python argument changes, it makes sense that you'd have to retrace the graph.

However, it's possible that a Python argument is not being used to control graph construction. In these cases, a change in the Python value can trigger needless retracing. Take, for example, this training loop, which AutoGraph will dynamically unroll. Despite the multiple traces, the generated graph is actually identical, so this is a bit inefficient.

def train_one_step():
  pass

@tf.function
def train(num_steps):
  print("Tracing with num_steps = {}".format(num_steps))
  for _ in tf.range(num_steps):
    train_one_step()

train(num_steps=10)
train(num_steps=20)

Tracing with num_steps = 10
Tracing with num_steps = 20

The simple workaround here is to cast your arguments to Tensors if they do not affect the shape of the generated graph.

train(num_steps=tf.constant(10))
train(num_steps=tf.constant(20))
Tracing with num_steps = Tensor("num_steps:0", shape=(), dtype=int32)

Side effects in tf.function

In general, Python side effects (like printing or mutating objects) only happen during tracing. So how can you reliably trigger side effects from tf.function?

The general rule of thumb is to only use Python side effects to debug your traces. Otherwise, TensorFlow ops like tf.Variable.assign, tf.print, and tf.summary are the best way to ensure your code will be traced and executed by the TensorFlow runtime with each call. In general using a functional style will yield the best results.

@tf.function
def f(x):
  print("Traced with", x)
  tf.print("Executed with", x)

f(1)
f(1)
f(2)

Traced with 1
Executed with 1
Executed with 1
Traced with 2
Executed with 2

If you would like to execute Python code during each invocation of a tf.function, tf.py_function is an exit hatch. The drawback of tf.py_function is that it's not portable or particularly performant, nor does it work well in distributed (multi-GPU, TPU) setups. Also, since tf.py_function has to be wired into the graph for differentiability, it casts all inputs/outputs to tensors.

external_list = []

def side_effect(x):
  print('Python side effect')
  external_list.append(x)

@tf.function
def f(x):
  tf.py_function(side_effect, inp=[x], Tout=[])

f(1)
f(1)
f(1)
assert len(external_list) == 3
# .numpy() call required because py_function casts 1 to tf.constant(1)
assert external_list[0].numpy() == 1

Python side effect
Python side effect
Python side effect

Beware of Python state

Many Python features, such as generators and iterators, rely on the Python runtime to keep track of state. In general, while these constructs work as expected in Eager mode, many unexpected things can happen inside a tf.function due to tracing behavior.

To give one example, advancing iterator state is a Python side effect and therefore only happens during tracing.

external_var = tf.Variable(0)
@tf.function
def buggy_consume_next(iterator):
  external_var.assign_add(next(iterator))
  tf.print("Value of external_var:", external_var)

iterator = iter([0, 1, 2, 3])
buggy_consume_next(iterator)
# This reuses the first value from the iterator, rather than consuming the next value.
buggy_consume_next(iterator)
buggy_consume_next(iterator)

Value of external_var: 0
Value of external_var: 0
Value of external_var: 0

Variables

We can use the same idea of leveraging the intended execution order of the code to make variable creation and utilization very easy in tf.function. There is one very important caveat, though, which is that with variables it's possible to write code which behaves differently in eager mode and graph mode.

Specifically, this will happen when you create a new Variable with each call. Due to tracing semantics, tf.function will reuse the same variable each call, but eager mode will create a new variable with each call. To guard against this mistake, tf.function will raise an error if it detects dangerous variable creation behavior.

@tf.function
def f(x):
  v = tf.Variable(1.0)
  v.assign_add(x)
  return v

with assert_raises(ValueError):
  f(1.0)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Caught expected exception 
  <class 'ValueError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
    yield
  File "<ipython-input-17-73e410646579>", line 8, in <module>
    f(1.0)
ValueError: in user code:

    <ipython-input-17-73e410646579>:3 f  *
        v = tf.Variable(1.0)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:261 __call__  **
        return cls._variable_v2_call(*args, **kwargs)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:255 _variable_v2_call
        shape=shape)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:66 getter
        return captured_getter(captured_previous, **kwargs)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py:511 invalid_creator_scope
        "tf.function-decorated function tried to create "

    ValueError: tf.function-decorated function tried to create variables on non-first call.


Non-ambiguous code is ok, though.

v = tf.Variable(1.0)

@tf.function
def f(x):
  return v.assign_add(x)

print(f(1.0))  # 2.0
print(f(2.0))  # 4.0

tf.Tensor(2.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

You can also create variables inside a tf.function as long as we can prove that those variables are created only the first time the function is executed.

class C:
  pass

obj = C()
obj.v = None

@tf.function
def g(x):
  if obj.v is None:
    obj.v = tf.Variable(1.0)
  return obj.v.assign_add(x)

print(g(1.0))  # 2.0
print(g(2.0))  # 4.0
tf.Tensor(2.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

Variable initializers can depend on function arguments and on values of other variables. We can figure out the right initialization order using the same method we use to generate control dependencies.

state = []
@tf.function
def fn(x):
  if not state:
    state.append(tf.Variable(2.0 * x))
    state.append(tf.Variable(state[0] * 3.0))
  return state[0] * x * state[1]

print(fn(tf.constant(1.0)))
print(fn(tf.constant(3.0)))

tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(36.0, shape=(), dtype=float32)

AutoGraph Transformations

AutoGraph is a library that is on by default in tf.function, and transforms a subset of Python Eager code into graph-compatible TensorFlow ops. This includes control flow like if, for, while.

TensorFlow ops like tf.cond and tf.while_loop continue to work, but control flow is often easier to write and understand when written in Python.

# Simple loop

@tf.function
def f(x):
  while tf.reduce_sum(x) > 1:
    tf.print(x)
    x = tf.tanh(x)
  return x

f(tf.random.uniform([5]))
[0.340402246 0.141038418 0.802301645 0.41956234 0.955718398]
[0.327836454 0.140110627 0.665321589 0.396561682 0.74236095]
[0.316575378 0.139200941 0.581894 0.377003163 0.630569458]
[0.306407064 0.138308764 0.524040699 0.360102057 0.558444262]
[0.297164947 0.137433544 0.480812579 0.345303923 0.506822228]
[0.288716 0.13657476 0.446894139 0.332204252 0.46746552]
[0.280952573 0.135731891 0.419342667 0.3205 0.43614924]
[0.273786485 0.134904459 0.39637652 0.309958935 0.410447478]
[0.26714462 0.134091988 0.376844317 0.300399721 0.388852566]
[0.260965914 0.133294046 0.359963804 0.291678369 0.370370626]
[0.255198747 0.1325102 0.345182151 0.28367883 0.354315847]
[0.249799296 0.131740034 0.332095921 0.27630648 0.340197474]
[0.244729981 0.130983159 0.320402771 0.269483209 0.327653676]
[0.239958405 0.130239189 0.309871048 0.263143897 0.316410929]
[0.235456467 0.129507765 0.300319761 0.257233769 0.306258053]
[0.231199622 0.128788546 0.291605204 0.251706362 0.297029078]
[0.227166384 0.128081188 0.283611566 0.246522 0.288591444]
[0.223337784 0.127385378 0.276244342 0.241646513 0.280837864]
[0.219697058 0.126700804 0.269425571 0.23705034 0.273680359]
[0.216229305 0.126027152 0.263090253 0.232707739 0.267046064]
[0.212921217 0.12536414 0.257183671 0.228596181 0.260874063]
[0.209760889 0.124711499 0.251659453 0.224695832 0.255112886]
[0.206737623 0.124068953 0.246477932 0.220989183 0.2497188]
[0.203841776 0.12343625 0.241605014 0.217460677 0.244654313]
[0.201064616 0.122813135 0.237011179 0.214096457 0.239887089]

<tf.Tensor: shape=(5,), dtype=float32, numpy=
array([0.19839825, 0.12219938, 0.23267071, 0.21088414, 0.2353891 ],
      dtype=float32)>

If you're curious you can inspect the code autograph generates.

print(tf.autograph.to_code(f.python_function))
def tf__f(x):
    do_return = False
    retval_ = ag__.UndefinedReturnValue()
    with ag__.FunctionScope('f', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:

        def get_state():
            return (x,)

        def set_state(loop_vars):
            nonlocal x
            (x,) = loop_vars

        def loop_body():
            nonlocal x
            ag__.converted_call(tf.print, (x,), None, fscope)
            x = ag__.converted_call(tf.tanh, (x,), None, fscope)

        def loop_test():
            return (ag__.converted_call(tf.reduce_sum, (x,), None, fscope) > 1)
        ag__.while_stmt(loop_test, loop_body, get_state, set_state, ('x',), {})
        try:
            do_return = True
            retval_ = fscope.mark_return_value(x)
        except:
            do_return = False
            raise
    (do_return,)
    return ag__.retval(retval_)


Conditionals

AutoGraph will convert some if <condition> statements into the equivalent tf.cond calls. This substitution is made if <condition> is a Tensor. Otherwise, the if statement is executed as a Python conditional.

A Python conditional executes during tracing, so exactly one branch of the conditional will be added to the graph. Without AutoGraph, this traced graph would be unable to take the alternate branch if there is data-dependent control flow.

tf.cond traces and adds both branches of the conditional to the graph, dynamically selecting a branch at execution time. Tracing can have unintended side effects; see AutoGraph tracing effects for more.

@tf.function
def fizzbuzz(n):
  for i in tf.range(1, n + 1):
    print('Tracing for loop')
    if i % 15 == 0:
      print('Tracing fizzbuzz branch')
      tf.print('fizzbuzz')
    elif i % 3 == 0:
      print('Tracing fizz branch')
      tf.print('fizz')
    elif i % 5 == 0:
      print('Tracing buzz branch')
      tf.print('buzz')
    else:
      print('Tracing default branch')
      tf.print(i)

fizzbuzz(tf.constant(5))
fizzbuzz(tf.constant(20))
Tracing for loop
Tracing fizzbuzz branch
Tracing fizz branch
Tracing buzz branch
Tracing default branch
1
2
fizz
4
buzz
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
buzz

See the reference documentation for additional restrictions on AutoGraph-converted if statements.

Loops

AutoGraph will convert some for and while statements into the equivalent TensorFlow looping ops, like tf.while_loop. If not converted, the for or while loop is executed as a Python loop.

This substitution is made in the following situations:

  • for x in y: if y is a Tensor, convert to tf.while_loop. In the special case where y is a tf.data.Dataset, a combination of tf.data.Dataset ops are generated.
  • while <condition>: if <condition> is a Tensor, convert to tf.while_loop.

A Python loop executes during tracing, adding additional ops to the tf.Graph for every iteration of the loop.

A TensorFlow loop traces the body of the loop, and dynamically selects how many iterations to run at execution time. The loop body only appears once in the generated tf.Graph.

See the reference documentation for additional restrictions on AutoGraph-converted for and while statements.

Looping over Python data

A common pitfall is to loop over Python/Numpy data within a tf.function. This loop will execute during the tracing process, adding a copy of your model to the tf.Graph for each iteration of the loop.

If you want to wrap the entire training loop in tf.function, the safest way to do this is to wrap your data as a tf.data.Dataset so that AutoGraph will dynamically unroll the training loop.

def measure_graph_size(f, *args):
  g = f.get_concrete_function(*args).graph
  print("{}({}) contains {} nodes in its graph".format(
      f.__name__, ', '.join(map(str, args)), len(g.as_graph_def().node)))

@tf.function
def train(dataset):
  loss = tf.constant(0)
  for x, y in dataset:
    loss += tf.abs(y - x) # Some dummy computation.
  return loss

small_data = [(1, 1)] * 3
big_data = [(1, 1)] * 10
measure_graph_size(train, small_data)
measure_graph_size(train, big_data)

measure_graph_size(train, tf.data.Dataset.from_generator(
    lambda: small_data, (tf.int32, tf.int32)))
measure_graph_size(train, tf.data.Dataset.from_generator(
    lambda: big_data, (tf.int32, tf.int32)))
train([(1, 1), (1, 1), (1, 1)]) contains 11 nodes in its graph
train([(1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)]) contains 32 nodes in its graph
train(<FlatMapDataset shapes: (<unknown>, <unknown>), types: (tf.int32, tf.int32)>) contains 8 nodes in its graph
train(<FlatMapDataset shapes: (<unknown>, <unknown>), types: (tf.int32, tf.int32)>) contains 8 nodes in its graph

When wrapping Python/Numpy data in a Dataset, be mindful of tf.data.Dataset.from_generator versus tf.data.Dataset.from_tensors. The former will keep the data in Python and fetch it via tf.py_function which can have performance implications, whereas the latter will bundle a copy of the data as one large tf.constant() node in the graph, which can have memory implications.

Reading data from files via TFRecordDataset/CsvDataset/etc. is the most effective way to consume data, as then TensorFlow itself can manage the asynchronous loading and prefetching of data, without having to involve Python. To learn more, see the tf.data guide.

Accumulating values in a loop

A common pattern is to accumulate intermediate values from a loop. Normally, this is accomplished by appending to a Python list or adding entries to a Python dictionary. However, as these are Python side effects, they will not work as expected in a dynamically unrolled loop. Use tf.TensorArray to accumulate results from a dynamically unrolled loop.

batch_size = 2
seq_len = 3
feature_size = 4

def rnn_step(inp, state):
  return inp + state

@tf.function
def dynamic_rnn(rnn_step, input_data, initial_state):
  # [batch, time, features] -> [time, batch, features]
  input_data = tf.transpose(input_data, [1, 0, 2])
  max_seq_len = input_data.shape[0]

  states = tf.TensorArray(tf.float32, size=max_seq_len)
  state = initial_state
  for i in tf.range(max_seq_len):
    state = rnn_step(input_data[i], state)
    states = states.write(i, state)
  return tf.transpose(states.stack(), [1, 0, 2])
  
dynamic_rnn(rnn_step,
            tf.random.uniform([batch_size, seq_len, feature_size]),
            tf.zeros([batch_size, feature_size]))
<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[0.9482862 , 0.8276074 , 0.05674112, 0.32767332],
        [1.6937461 , 1.0797887 , 0.58639705, 0.6134182 ],
        [1.9751238 , 1.5447042 , 0.7830416 , 1.0239146 ]],

       [[0.32333958, 0.72486174, 0.6798818 , 0.6224296 ],
        [0.98534155, 1.4966407 , 0.9999871 , 0.8372327 ],
        [1.0591766 , 2.0142398 , 1.479817  , 1.7913209 ]]], dtype=float32)>

Further reading

To learn more about graph optimizations that are performed after tracing a tf.function, see the Grappler guide. To learn how to optimize your data pipeline and profile your model, see the Profiler guide.