|View source on GitHub|
Training helper that restores from checkpoint and creates session.
tf.compat.v1.train.SessionManager( local_init_op=None, ready_op=None, ready_for_local_init_op=None, graph=None, recovery_wait_secs=30, local_init_run_options=None, local_init_feed_dict=None )
This class is a small wrapper that takes care of session creation and checkpoint recovery. It also provides functions that to facilitate coordination among multiple training threads or processes.
- Checkpointing trained variables as the training progresses.
- Initializing variables on startup, restoring them from the most recent checkpoint after a crash, or wait for checkpoints to become available.
with tf.Graph().as_default(): ...add operations to the graph... # Create a SessionManager that will checkpoint the model in '/tmp/mydir'. sm = SessionManager() sess = sm.prepare_session(master, init_op, saver, checkpoint_dir) # Use the session to train the graph. while True: sess.run(<my_train_op>)
prepare_session() initializes or restores a model. It requires
saver as an argument.
A second process could wait for the model to be ready by doing the following:
with tf.Graph().as_default(): ...add operations to the graph... # Create a SessionManager that will wait for the model to become ready. sm = SessionManager() sess = sm.wait_for_session(master) # Use the session to train the graph. while True: sess.run(<my_train_op>)
wait_for_session() waits for a model to be initialized by other processes.
||Seconds between checks for the model to be ready.|
||RunOptions to be passed to session.run when executing the local_init_op.|
||Optional session feed dictionary to use when running the local_init_op.|
||If ready_for_local_init_op is not None but local_init_op is None|
prepare_session( master, init_op=None, saver=None, checkpoint_dir=None, checkpoint_filename_with_path=None, wait_for_checkpoint=False, max_wait_secs=7200, config=None, init_feed_dict=None, init_fn=None )
Session. Makes sure the model is ready to be used.
Session on 'master'. If a
saver object is passed in, and
checkpoint_dir points to a directory containing valid checkpoint
files, then it will try to recover the model from checkpoint. If
no checkpoint files are available, and
True, then the process would check every
max_wait_secs, for recovery to succeed.
If the model cannot be recovered successfully then it is initialized by
init_op and calling
init_fn if they are provided.
local_init_op is also run after init_op and init_fn, regardless of
whether the model was recovered successfully, but only if
If the model is recovered from a checkpoint it is assum