View source on GitHub |
Manages multiple checkpoints by keeping some and deleting unneeded ones.
tf.train.CheckpointManager(
checkpoint,
directory,
max_to_keep,
keep_checkpoint_every_n_hours=None,
checkpoint_name='ckpt',
step_counter=None,
checkpoint_interval=None,
init_fn=None
)
Example usage:
import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(
checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
# train
manager.save()
CheckpointManager
preserves its own state across instantiations (see the
__init__
documentation for details). Only one should be active in a
particular directory at a time.
Args | |
---|---|
checkpoint
|
The tf.train.Checkpoint instance to save and manage
checkpoints for.
|
directory
|
The path to a directory in which to write checkpoints. A
special file named "checkpoint" is also written to this directory (in a
human-readable text format) which contains the state of the
CheckpointManager .
|
max_to_keep
|
An integer, the number of checkpoints to keep. Unless
preserved by keep_checkpoint_every_n_hours , checkpoints will be
deleted from the active set, oldest first, until only max_to_keep
checkpoints remain. If None , no checkpoints are deleted and everything
stays in the active set. Note that max_to_keep=None will keep all
checkpoint paths in memory and in the checkpoint state protocol buffer
on disk.
|
keep_checkpoint_every_n_hours
|
Upon removal from the active set, a
checkpoint will be preserved if it has been at least
keep_checkpoint_every_n_hours since the last preserved checkpoint. The
default setting of None does not preserve any checkpoints in this way.
|
checkpoint_name
|
Custom name for the checkpoint file. |
step_counter
|
A tf.Variable instance for checking the current step
counter value, in case users want to save checkpoints every N steps.
|
checkpoint_interval
|
An integer, indicates the minimum step interval between two checkpoints. |
init_fn
|
Callable. A function to do customized intialization if no checkpoints are in the directory. |
Raises | |
---|---|
ValueError
|
If max_to_keep is not a positive integer.
|
Attributes | |
---|---|
checkpoint
|
Returns the tf.train.Checkpoint object.
|
checkpoint_interval
|
|
checkpoints
|
A list of managed checkpoints.
Note that checkpoints saved due to |
directory
|
|
latest_checkpoint
|
The prefix of the most recent checkpoint in directory .
Equivalent to Suitable for passing to |
Methods
restore_or_initialize
restore_or_initialize()
Restore items in checkpoint
from the latest checkpoint file.
This method will first try to restore from the most recent checkpoint in
directory
. If no checkpoints exist in directory
, and init_fn
is
specified, this method will call init_fn
to do customized
initialization. This can be used to support initialization from pretrained
models.
Note that unlike tf.train.Checkpoint.restore()
, this method doesn't return
a load status object that users can run assertions on
(e.g. assert_consumed()). Thus to run assertions, users should directly use
tf.train.Checkpoint.restore()
method.
Returns | |
---|---|
The restored checkpoint path if the lastest checkpoint is found and restored. Otherwise None. |
save
save(
checkpoint_number=None, check_interval=True, options=None
)
Creates a new checkpoint and manages it.
Args | |
---|---|
checkpoint_number
|
An optional integer, or an integer-dtype Variable or
Tensor , used to number the checkpoint. If None (default),
checkpoints are numbered using checkpoint.save_counter . Even if
checkpoint_number is provided, save_counter is still incremented. A
user-provided checkpoint_number is not incremented even if it is a
Variable .
|
check_interval
|
An optional boolean. The argument is only effective when
checkpoint_interval is passed into the manager. If True , the manager
will only save the checkpoint if the interval between checkpoints is
larger than checkpoint_interval . Otherwise it will always save the
checkpoint unless a checkpoint has already been saved for the current
step.
|
options
|
Optional tf.train.CheckpointOptions object. This argument only
works with TF2 checkpoint objects. For example, options =
tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
|
Returns | |
---|---|
The path to the new checkpoint. It is also recorded in the checkpoints
and latest_checkpoint properties. None if no checkpoint is saved.
|
sync
sync()
Wait for any outstanding save or restore operations.