
A driver that runs a python policy in a python environment.

Inherits From: Driver

Used in the notebooks

env A py_environment.Base environment.
policy A py_policy.PyPolicy policy.
observers A list of observers that are notified after every step in the environment. Each observer is a callable(trajectory.Trajectory).
transition_observers A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). The transition is shaped just as trajectories are for regular observers.
info_observers A list of observers that are notified after every step in the environment. Each observer is a callable(info).
max_steps Optional maximum number of steps for each run() call. For batched or parallel environments, this is the maximum total number of steps summed across all environments. Also see below. Default: 0.
max_episodes Optional maximum number of episodes for each run() call. For batched or parallel environments, this is the maximum total number of episodes summed across all environments. At least one of max_steps or max_episodes must be provided. If both are set, run() terminates when at least one of the conditions is satisfied. Default: 0.
end_episode_on_boundary This parameter should be False when using transition observers and be True when using trajectory observers.

ValueError If both max_steps and max_episodes are None.








View source

Run policy in environment given initial time_step and policy_state.

time_step The initial time_step.
policy_state The initial policy_state.

A tuple (final time_step, final policy_state).