|View source on GitHub|
Configures TensorFlow ops to run deterministically.
When op determinism is enabled, TensorFlow ops will be deterministic. This means that if an op is run multiple times with the same inputs on the same hardware, it will have the exact same outputs each time. This is useful for debugging models. Note that determinism in general comes at the expense of lower performance and so your model may run slower when op determinism is enabled.
If you want your TensorFlow program to run deterministically, put the following code near the start of your program.
tf.keras.utils.set_random_seed sets the Python seed, the NumPy seed,
and the TensorFlow seed. Setting these seeds is necessary to ensure any random
numbers your program generates are also deterministic.
By default, op determinism is not enabled, so ops might return different results when run with the same inputs. These differences are often caused by the use of asynchronous threads within the op nondeterministically changing the order in which floating-point numbers are added. Most of these cases of nondeterminism occur on GPUs, which have thousands of hardware threads that are used to run ops. Enabling determinism directs such ops to use a different algorithm, one that does not use threads in a nondeterministic way.
Another potential source of nondeterminism is
tf.data based data processing.
Typically, this can introduce nondeterminsm due to the use of parallelism in
methods such as
Dataset.map producing inputs or running stateful ops in a
nondeterministic order. Enabling determinism will remove such sources of
Enabling determinism will likely make your model or your
processing slower. For example,
Dataset.map can become several orders of
magnitude slower when the map function has random ops or other stateful ops.
See the “Determinism and tf.data” section below for more details. In future
TensorFlow releases, we plan on improving the performance of determinism,
especially for common scenarios such as
Certain ops will raise an
UnimplementedError because they do not yet have a
deterministic implementation. Additionally, due to bugs, some ops might be
nondeterministic and not raise an
UnimplementedError. If you encounter such
ops, please file an issue.
An example of enabling determinism follows. The
tf.nn.softmax_cross_entropy_with_logits op is run multiple times and the
output is shown to be the same each time. This example would likely fail when
run on a GPU if determinism were not enabled, because
tf.nn.softmax_cross_entropy_with_logits uses a nondeterministic algorithm on
GPUs by default.
labels = tf.random.normal((1, 10000)) logits = tf.random.normal((1, 10000)) output = tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits) for _ in range(5): output2 = tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits) tf.debugging.assert_equal(output, output2)
Writing deterministic models
You can make your models deterministic by enabling op determinism. This means that you can train a model and finish each run with exactly the same trainable variables. This also means that the inferences of your previously-trained model will be exactly the same on each run. Typically, models can be made deterministic by simply setting the seeds and enabling op determinism, as in the example above. However, to guarantee that your model operates deterministically, you must meet all the following requirements:
tf.config.experimental.enable_op_determinism(), as mentioned above.
- Reproducibly reset any pseudorandom number generators (PRNGs) you’re using,
such as by setting the seeds for the default PRNGs in TensorFlow, Python,
and NumPy, as mentioned above. Note that certain newer NumPy classes like
numpy.random.default_rngignore the global NumPy seed, so a seed must be explicitly passed to such classes, if used.
- Use the same hardware configuration in every run.
- Use the same software environment in every run (OS, checkpoints, version of CUDA and TensorFlow, environmental variables, etc). Note that determinism is not guaranteed across different versions of TensorFlow.
- Do not use constructs outside TensorFlow that are nondeterministic, such as
/dev/randomor using multiple threads/processes in ways that influence TensorFlow’s behavior.
- Ensure your input pipeline is deterministic. If you use
tf.data, this is done automatically (at the expense of performance). See "Determinism and tf.data" below for more information.
- Do not use
tf.distribute.experimental.ParameterServerStrategy, which can introduce nondeterminism. Besides ops (including
tf.dataops), these are the only known potential sources of nondeterminism within TensorFlow, (if you find more, please file an issue). Note that
tf.compat.v1.Sessionis required to use the TF1 API, so determinism cannot be guaranteed when using the TF1 API.
- Do not use nondeterministic custom ops.
Additional details on determinism
For stateful ops to be deterministic, the state of the system must be the same
every time the op is run. For example the output of
(obviously) depends on both the variable value and the
parameter. When determinism is enabled, the side effects of stateful ops are
TensorFlow’s random ops, such as
tf.random.normal, will raise a
RuntimeError if determinism is enabled and a seed has not been set. However,
attempting to generate nondeterministic random numbers using Python or NumPy
will not raise such errors. Make sure you remember to set the Python and NumPy
tf.keras.utils.set_random_seed is an easy way to set all
Note that latency, memory consumption, throughput, and other performance
characteristics are not made deterministic by enabling op determinism.
Only op outputs and side effects are made deterministic. Additionally, a model
may nondeterministically raise a
tf.errors.ResourceExhaustedError from a
lack of memory due to the fact that memory consumption is nondeterministic.
Determinism and tf.data
Enabling deterministic ops makes
tf.data deterministic in several ways:
- For dataset methods with a
deterministicargument, such as
deterministicargument is overridden to be
Trueirrespective of its setting.
tf.data.Option.experimental_deterministicoption is overridden to be
Trueirrespective of its setting..
Dataset.interleave, if the map or interleave function has stateful random ops or other stateful ops, the function will run serially instead of in parallel. This means the
interleaveis effectively ignored.
- Prefetching with
Dataset.prefetchwill be disabled if any function run as part of the input pipeline has certain stateful ops. Similarly, any dataset method with a
num_parallel_callsargument will be made to run serially if any function in the input pipeline has such stateful ops. Legacy random ops such as
tf.random.normalwill not cause such datasets to be changed, but most other stateful ops will.
Unfortunately, due to (3), performance can be greatly reduced when stateful
ops are used in
Dataset.map due to no longer running the map function in
parallel. A common example of stateful ops used in
Dataset.map are random
ops, such as