tf.compat.v1.train.MonitoredSession

Session-like object that handles initialization, recovery and hooks.

tf.compat.v1.train.MonitoredSession(
    session_creator=None, hooks=None, stop_grace_period_secs=120
)

Migrate to TF2

This API is not compatible with eager execution and tf.function. To migrate to TF2, rewrite the code to be compatible with eager execution. Check the migration guide on replacing Session.run calls. In Keras, session hooks can be replaced by Callbacks e.g. logging hook notebook For more details please read Better performance with tf.function.

Description

Example usage:

saver_hook = CheckpointSaverHook(...)
summary_hook = SummarySaverHook(...)
with MonitoredSession(session_creator=ChiefSessionCreator(...),
                      hooks=[saver_hook, summary_hook]) as sess:
  while not sess.should_stop():
    sess.run(train_op)

Initialization: At creation time the monitored session does following things in given order:

calls hook.begin() for each given hook
finalizes the graph via scaffold.finalize()
create session
initializes the model via initialization ops provided by Scaffold
restores variables if a checkpoint exists
launches queue runners
calls hook.after_create_session()

Run: When run() is called, the monitored session does following things:

calls hook.before_run()
calls TensorFlow session.run() with merged fetches and feed_dict
calls hook.after_run()
returns result of session.run() asked by user
if AbortedError or UnavailableError occurs, it recovers or reinitializes the session before executing the run() call again

Exit: At the close(), the monitored session does following things in order:

calls hook.end()
closes the queue runners and the session
suppresses OutOfRange error which indicates that all inputs have been processed if the monitored_session is used as a context

How to set tf.compat.v1.Session arguments:

In most cases you can set session arguments as follows:

MonitoredSession(
  session_creator=ChiefSessionCreator(master=..., config=...))

In distributed setting for a non-chief worker, you can use following:

MonitoredSession(
  session_creator=WorkerSessionCreator(master=..., config=...))

See MonitoredTrainingSession for an example usage based on chief or worker.

it cannot be set as default session.
it cannot be sent to saver.save.
it cannot be sent to tf.train.start_queue_runners.

Args
`session_creator`	A factory object to create session. Typically a `ChiefSessionCreator` which is the default one.
`hooks`	An iterable of `SessionRunHook' objects.

Returns
A MonitoredSession object.

Attributes
`graph`	The graph that was launched in this session.

Child Classes

class StepContext

Methods

`close`

View source

close()

`run`

View source

run(
    fetches, feed_dict=None, options=None, run_metadata=None
)

Run ops in the monitored session.

This method is completely compatible with the tf.Session.run() method.

Args
`fetches`	Same as `tf.Session.run()`.
`feed_dict`	Same as `tf.Session.run()`.
`options`	Same as `tf.Session.run()`.
`run_metadata`	Same as `tf.Session.run()`.

Returns
Same as `tf.Session.run()`.

`run_step_fn`

View source

run_step_fn(
    step_fn
)

Run ops using a step function.

Args

Args
`step_fn`	A function or a method with a single argument of type `StepContext`. The function may use methods of the argument to perform computations with access to a raw session. The returned value of the `step_fn` will be returned from `run_step_fn`, unless a stop is requested. In that case, the next `should_stop` call will return True. Example usage: ```python with tf.Graph().as_default(): c = tf.compat.v1.placeholder(dtypes.float32) v = tf.add(c, 4.0) w = tf.add(c, 0.5) def step_fn(step_context): a = step_context.session.run(fetches=v, feed_dict={c: 0.5}) if a <= 4.5: step_context.request_stop() return step_context.run_with_hooks(fetches=w, feed_dict={c: 0.1}) with tf.MonitoredSession() as session: while not session.should_stop(): a = session.run_step_fn(step_fn) ``` Hooks interact with the `run_with_hooks()` call inside the `step_fn` as they do with a `MonitoredSession.run` call.

step_fn

A function or a method with a single argument of type StepContext. The function may use methods of the argument to perform computations with access to a raw session. The returned value of the step_fn will be returned from run_step_fn, unless a stop is requested. In that case, the next should_stop call will return True. Example usage:

```python
with tf.Graph().as_default():
  c = tf.compat.v1.placeholder(dtypes.float32)
  v = tf.add(c, 4.0)
  w = tf.add(c, 0.5)
  def step_fn(step_context):
    a = step_context.session.run(fetches=v, feed_dict={c: 0.5})
    if a <= 4.5:
      step_context.request_stop()
      return step_context.run_with_hooks(fetches=w,
                                         feed_dict={c: 0.1})

  with tf.MonitoredSession() as session:
    while not session.should_stop():
      a = session.run_step_fn(step_fn)
```
Hooks interact with the `run_with_hooks()` call inside the
     `step_fn` as they do with a `MonitoredSession.run` call.

Returns
Returns the returned value of `step_fn`.

Raises
`StopIteration`	if `step_fn` has called `request_stop()`. It may be caught by `with tf.MonitoredSession()` to close the session.
`ValueError`	if `step_fn` doesn't have a single argument called `step_context`. It may also optionally have `self` for cases when it belongs to an object.

`should_stop`

View source

should_stop()

`enter`

View source

__enter__()

`exit`

View source

__exit__(
    exception_type, exception_value, traceback
)

tf.compat.v1.train.MonitoredSession Stay organized with collections Save and categorize content based on your preferences.

Migrate to TF2

Description

Example usage:

Args

Returns

Attributes

Child Classes

Methods

close

run

run_step_fn

should_stop

__enter__

__exit__

tf.compat.v1.train.MonitoredSession

`close`

`run`

`run_step_fn`

`should_stop`

`enter`

`exit`