The actor manages interactions between a policy and an environment. Users should configure the metrics and summaries for a specific task like evaluation or data collection.

The main point of access for users is the run method. This will iterate over either n steps_per_run or episodes_per_run. At least one of steps_per_run or episodes_per_run must be provided.

env An instance of either a tf or py environment. Note the policy, and observers should match the tf/pyness of the env.
policy An instance of a policy used to interact with the environment.
train_step A scalar tf.int64 tf.Variable which will keep track of the number of train steps. This is used for artifacts created like summaries.
steps_per_run Number of steps to evaluated per run call. See below.
episodes_per_run Number of episodes evaluated per run call.
observers A list of observers that are notified after every step in the environment. Each observer is a callable(trajectory.Trajectory).
transition_observers A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). The transition is shaped just as trajectories are for regular observers.
info_observers A list of observers that are notified after every step in the environment. Each observer is a callable(info).
metrics A list of metric observers that output a scaler.
reference_metrics Optional list of metrics for which other metrics are plotted against. As an example passing in a metric that tracks number of environment episodes will result in having summaries of all other metrics over this value. Note summaries against the train_step are done by default. If you want reference_metrics to be updated make sure they are also added to the metrics list.
image_metrics A list of metric observers that output an image.
summary_dir Path used for summaries. If no path is provided no summaries are written.
summary_interval How often summaries are written.
end_episode_on_boundary This parameter should be False when using transition observers and be True when using trajectory observers. It is used in py_driver.
name Name for the actor used as a prefix to generated summaries.








Logs metric results to stdout.


Reset the environment to the start and the policy state.


Generates scalar summaries for the actor metrics.