tf.compat.v1.train.SessionManager

Training helper that restores from checkpoint and creates session.

This class is a small wrapper that takes care of session creation and checkpoint recovery. It also provides functions that to facilitate coordination among multiple training threads or processes.

  • Checkpointing trained variables as the training progresses.
  • Initializing variables on startup, restoring them from the most recent checkpoint after a crash, or wait for checkpoints to become available.

Usage:

with tf.Graph().as_default():
   ...add operations to the graph...
  # Create a SessionManager that will checkpoint the model in '/tmp/mydir'.
  sm = SessionManager()
  sess = sm.prepare_session(master, init_op, saver, checkpoint_dir)
  # Use the session to train the graph.
  while True:
    sess.run(<my_train_op>)

prepare_session() initializes or restores a model. It requires init_op and saver as an argument.

A second process could wait for the model to be ready by doing the following:

with tf.Graph().as_default():
   ...add operations to the graph...
  # Create a SessionManager that will wait for the model to become ready.
  sm = SessionManager()
  sess = sm.wait_for_session(master)
  # Use the session to train the graph.
  while True:
    sess.run(<my_train_op>)

wait_for_session() waits for a model to be initialized by other processes.

local_init_op An Operation run immediately after session creation. Usually used to initialize tables and local variables.
ready_op An Operation to check if the model is initialized.
ready_for_local_init_op An Operation to check if the model is ready to run local_init_op.
graph The Graph that the model will use.
recovery_wait_secs Seconds between checks for the model to be ready.
local_init_run_options RunOptions to be passed to session.run when executing the local_init_op.
local_init_feed_dict Optional session feed dictionary to use when running the local_init_op.

ValueError If ready_for_local_init_op is not None but local_init_op is None

Methods

prepare_session

View source

Creates a Session. Makes sure the model is ready to be used.

Creates a Session on 'master'. If a saver object is passed in, and checkpoint_dir points to a directory containing valid checkpoint files, then it will try to recover the model from checkpoint. If no checkpoint files are available, and wait_for_checkpoint is True, then the process would check every recovery_wait_secs, up to max_wait_secs, for recovery to succeed.

If the model cannot be recovered successfully then it is initialized by running the init_op and calling init_fn if they are provided. The local_init_op is also run after init_op and init_fn, regardless of whether the model was recovered successfully, but only if ready_for_local_init_op passes.

If the model is recovered from a checkpoint it is assum