|View source on GitHub|
Session-like object that handles initialization, recovery and hooks.
tf.compat.v1.train.MonitoredSession( session_creator=None, hooks=None, stop_grace_period_secs=120 )
saver_hook = CheckpointSaverHook(...) summary_hook = SummarySaverHook(...) with MonitoredSession(session_creator=ChiefSessionCreator(...), hooks=[saver_hook, summary_hook]) as sess: while not sess.should_stop(): sess.run(train_op)
Initialization: At creation time the monitored session does following things in given order:
hook.begin()for each given hook
- finalizes the graph via
- create session
- initializes the model via initialization ops provided by
- restores variables if a checkpoint exists
- launches queue runners
run() is called, the monitored session does following things:
- calls TensorFlow
session.run()with merged fetches and feed_dict
- returns result of
session.run()asked by user
UnavailableErroroccurs, it recovers or reinitializes the session before executing the run() call again
Exit: At the
close(), the monitored session does following things in order:
- closes the queue runners and the session
OutOfRangeerror which indicates that all inputs have been processed if the monitored_session is used as a context
How to set
- In most cases you can set session arguments as follows:
MonitoredSession( session_creator=ChiefSessionCreator(master=..., config=...))
- In distributed setting for a non-chief worker, you can use following:
MonitoredSession( session_creator=WorkerSessionCreator(master=..., config=...))
MonitoredTrainingSession for an example usage based on chief or worker.
- it cannot be set as default session.
- it cannot be sent to saver.save.
- it cannot be sent to tf.train.start_queue_runners.
A factory object to create session. Typically a
||An iterable of `SessionRunHook' objects.|
|A MonitoredSession object.|
||The graph that was launched in this session.|
run( fetches, feed_dict=None, options=None, run_metadata=None )
Run ops in the monitored session.
This method is completely compatible with the
run_step_fn( step_fn )
Run ops using a step function.