Used in the notebooks

Used in the tutorials

The actor manages interactions between a policy and an environment. Users should configure the metrics and summaries for a specific task like evaluation or data collection.

The main point of access for users is the run method. This will iterate over either n steps_per_run or episodes_per_run. At least one of steps_per_run or episodes_per_run must be provided.

env An instance of either a tf or py environment. Note the policy, and observers should match the tf/pyness of the env.
policy An instance of a policy used to interact with the environment.
train_step A scalar tf.int64 tf.Variable which will keep track of the number of train steps. This is used for artifacts created like summaries.
steps_per_run Number of steps to evaluated per run call. See below.
episodes_per_run Number of episodes evaluated per run call.
observers A list of observers that are notified after every step in the environment. Each observer is a callable(trajectory.Trajectory).
transition_observers A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). The transition is shaped just as trajectories are for regular observers.
info_observers A list of observers that are notified after every step in the environment. Each observer is a callable(info).
metrics A list of metric observers that output a scaler.
reference_metrics Optional list of metrics for which other metrics are plotted against. As an example passing in a metric that tracks number of environment episodes will result in having summaries of all other metrics over this value. Note summaries against the train_step are done by default. If you want reference_metrics to be updated make sure they are also added to the metrics list.
image_metrics A list of metric observers that output an image.
summary_dir Path used for summaries. If no path is provided no summaries are written.
summary_interval How often summaries are written.
end_episode_on_boundary This parameter should be False when using transition observers and be True when using trajectory observers. It is used in py_driver.
name Name for the actor used as a prefix to generated summaries.








View source

Logs metric results to stdout.


View source

Reset the environment to the start and the policy state.


View source


View source


View source

Generates scalar summaries for the actor metrics.