View source on GitHub |
A CheckpointManager that also exports SavedModel
s.
tfm.core.savedmodel_checkpoint_manager.SavedModelCheckpointManager(
checkpoint: tf.train.Checkpoint,
directory: str,
max_to_keep: int,
modules_to_export: Optional[Mapping[str, tf.Module]] = None,
keep_checkpoint_every_n_hours: Optional[int] = None,
checkpoint_name: str = 'ckpt',
step_counter: Optional[tf.Variable] = None,
checkpoint_interval: Optional[int] = None,
init_fn: Optional[Callable[[], None]] = None
)
Attributes | |
---|---|
checkpoint
|
Returns the tf.train.Checkpoint object.
|
checkpoint_interval
|
|
checkpoints
|
A list of managed checkpoints.
Note that checkpoints saved due to |
directory
|
|
latest_checkpoint
|
The prefix of the most recent checkpoint in directory .
Equivalent to Suitable for passing to |
latest_savedmodel
|
The path of the most recent SavedModel in directory .
|
modules_to_export
|
|
savedmodels
|
A list of managed SavedModels. |
Methods
get_existing_savedmodels
get_existing_savedmodels() -> List[str]
Gets a list of all existing SavedModel paths in directory
.
Returns | |
---|---|
A list of all existing SavedModel paths. |
get_savedmodel_number_from_path
get_savedmodel_number_from_path(
savedmodel_path: str
) -> Union[int, None]
Gets the savedmodel_number/checkpoint_number from savedmodel filepath.
The savedmodel_number is global step when using with orbit controller.
Args | |
---|---|
savedmodel_path
|
savedmodel directory path. |
Returns | |
---|---|
Savedmodel number or None if no matched pattern found in savedmodel path. |
restore_or_initialize
restore_or_initialize()
Restore items in checkpoint
from the latest checkpoint file.
This method will first try to restore from the most recent checkpoint in
directory
. If no checkpoints exist in directory
, and init_fn
is
specified, this method will call init_fn
to do customized
initialization. This can be used to support initialization from pretrained
models.
Note that unlike tf.train.Checkpoint.restore()
, this method doesn't return
a load status object that users can run assertions on
(e.g. assert_consumed()). Thus to run assertions, users should directly use
tf.train.Checkpoint.restore()
method.
Returns | |
---|---|
The restored checkpoint path if the lastest checkpoint is found and restored. Otherwise None. |
save
save(
checkpoint_number: Optional[int] = None,
check_interval: bool = True,
options: Optional[tf.train.CheckpointOptions] = None
)
See base class.
savedmodels_iterator
savedmodels_iterator(
min_interval_secs: float = 0,
timeout: Optional[float] = None,
timeout_fn: Optional[Callable[[], bool]] = None
)
Continuously yield new SavedModel files as they appear.
The iterator only checks for new savedmodels when control flow has been
reverted to it. The logic is same to the train.checkpoints_iterator
.
Args | |
---|---|
min_interval_secs
|
The minimum number of seconds between yielding savedmodels. |
timeout
|
The maximum number of seconds to wait between savedmodels. If
left as None , then the process will wait indefinitely.
|
timeout_fn
|
Optional function to call after a timeout. If the function returns True, then it means that no new savedmodels will be generated and the iterator will exit. The function is called with no arguments. |
Yields | |
---|---|
String paths to latest SavedModel files as they arrive. |
sync
sync()
Wait for any outstanding save or restore operations.
wait_for_new_savedmodel
wait_for_new_savedmodel(
last_savedmodel: Optional[str] = None,
seconds_to_sleep: float = 1.0,
timeout: Optional[float] = None
) -> Union[str, None]
Waits until a new savedmodel file is found.
Args | |
---|---|
last_savedmodel
|
The last savedmodel path used or None if we're
expecting a savedmodel for the first time.
|
seconds_to_sleep
|
The number of seconds to sleep for before looking for a new savedmodel. |
timeout
|
The maximum number of seconds to wait. If left as None , then
the process will wait indefinitely.
|
Returns | |
---|---|
A new savedmodel path, or None if the timeout was reached. |