TensorFlow 1 version
|
View source on GitHub
|
Manages multiple checkpoints by keeping some and deleting unneeded ones.
tf.train.CheckpointManager(
checkpoint, directory, max_to_keep, keep_checkpoint_every_n_hours=None,
checkpoint_name='ckpt', step_counter=None, checkpoint_interval=None,
init_fn=None
)
Example usage:
import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(
checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
# train
manager.save()
CheckpointManager preserves its own state across instantiations (see the
__init__ documentation for details). Only one should be active in a
particular directory at a time.
Args | |
|---|---|
checkpoint
|
The tf.train.Checkpoint instance to save and manage
checkpoints for.
|
directory
|
The path to a directory in which to write checkpoints. A
special file named "checkpoint" is also written to this directory (in a
human-readable text format) which contains the state of the
CheckpointManager.
|
max_to_keep
|
An integer, the number of checkpoints to keep. Unless
preserved by keep_checkpoint_every_n_hours, checkpoints will be
deleted from the active set, oldest first, until only max_to_keep
checkpoints remain. If None, no checkpoints are deleted and everything
stays in the active set. Note that max_to_keep=None will keep all
checkpoint paths in memory and in the checkpoint state protocol buffer
on disk.
|
keep_checkpoint_every_n_hours
|
Upon removal from the active set, a
checkpoint will be preserved if it has been at least
keep_checkpoint_every_n_hours since the last preserved checkpoint. The
default setting of None does not preserve any checkpoints in this way.
|
checkpoint_name
|
Custom name for the checkpoint file. |
step_counter
|
A tf.Variable instance for checking the current step
counter value, in case users want to save checkpoints every N steps.
|
checkpoint_interval
|
An integer, indicates the minimum step interval between two checkpoints. |
init_fn
|
Callable. A function to do customized intialization if no checkpoints are in the directory. |
Raises | |
|---|---|
ValueError
|
If max_to_keep is not a positive integer.
|
Attributes | |
|---|---|
checkpoint
|
Returns the tf.train.Checkpoint object.
|
checkpoint_interval
|
|
checkpoints
|
A list of managed checkpoints.
Note that checkpoints saved due to |
directory
|
|
latest_checkpoint
|
The prefix of the most recent checkpoint in directory.
Equivalent to Suitable for passing to |
Methods
restore_or_initialize
restore_or_initialize()
Restore items in checkpoint from the latest checkpoint file.
This method will first try to restore from the most recent checkpoint in
directory. If no checkpoints exist in directory, and init_fn is
specified, this method will call init_fn to do customized
initialization. This can be used to support initialization from pretrained
models.
Note that unlike tf.train.Checkpoint.restore(), this method doesn't return
a load status object that users can run assertions on
(e.g. assert_consumed()). Thus to run assertions, users should directly use
tf.train.Checkpoint.restore() method.
| Returns | |
|---|---|
| The restored checkpoint path if the lastest checkpoint is found and restored. Otherwise None. |
save
save(
checkpoint_number=None, check_interval=True
)
Creates a new checkpoint and manages it.
| Args | |
|---|---|
checkpoint_number
|
An optional integer, or an integer-dtype Variable or
Tensor, used to number the checkpoint. If None (default),
checkpoints are numbered using checkpoint.save_counter. Even if
checkpoint_number is provided, save_counter is still incremented. A
user-provided checkpoint_number is not incremented even if it is a
Variable.
|
check_interval
|
An optional boolean. The argument is only effective when
checkpoint_interval is passed into the manager. If True, the manager
will only save the checkpoint if the interval between checkpoints is
larger than checkpoint_interval. Otherwise it will always save the
checkpoint unless a checkpoint has already been saved for the current
step.
|
| Returns | |
|---|---|
The path to the new checkpoint. It is also recorded in the checkpoints
and latest_checkpoint properties. None if no checkpoint is saved.
|
TensorFlow 1 version
View source on GitHub