tf.train.CheckpointManager

Manages multiple checkpoints by keeping some and deleting unneeded ones.

Used in the notebooks

Used in the guide Used in the tutorials

Example usage:

import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(
    checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
  # train
  manager.save()

CheckpointManager preserves its own state across instantiations (see the __init__ documentation for details). Only one should be active in a particular directory at a time.

checkpoint The tf.train.Checkpoint instance to save and manage checkpoints for.
directory The path to a directory in which to write checkpoints. A special file named "checkpoint" is also written to this directory (in a human-readable text format) which contains the state of the CheckpointManager.
max_to_keep An integer, the number of checkpoints to keep. Unless preserved by keep_checkpoint_every_n_hours, checkpoints