View source on GitHub |
A driver that takes N episodes in an environment using a tf.while_loop.
Inherits From: Driver
tf_agents.drivers.dynamic_episode_driver.DynamicEpisodeDriver(
env, policy, observers=None, transition_observers=None, num_episodes=1
)
Used in the notebooks
Used in the tutorials |
---|
The while loop will run num_episodes in the environment, counting transitions that result in ending an episode.
As environments run batched time_episodes, the counters for all batch elements are summed, and execution stops when the total exceeds num_episodes.
This termination condition can be overridden in subclasses by implementing the self._loop_condition_fn() method.
Raises | |
---|---|
ValueError
|
If env is not a tf_environment.Base or policy is not an instance of tf_policy.TFPolicy. |
Attributes | |
---|---|
env
|
|
info_observers
|
|
observers
|
|
policy
|
|
transition_observers
|
Methods
run
run(
time_step=None,
policy_state=None,
num_episodes=None,
maximum_iterations=None
)
Takes episodes in the environment using the policy and update observers.
If time_step
and policy_state
are not provided, run
will reset the
environment and request an initial state from the policy.
Args | |
---|---|
time_step
|
optional initial time_step. If None, it will be obtained by resetting the environment. Elements should be shape [batch_size, ...]. |
policy_state
|
optional initial state for the policy. If None, it will be obtained from the policy.get_initial_state(). |
num_episodes
|
Optional number of episodes to take in the environment. If None it would use initial num_episodes. |
maximum_iterations
|
Optional maximum number of iterations of the while loop to run. If provided, the cond output is AND-ed with an additional condition ensuring the number of iterations executed is no greater than maximum_iterations. |
Returns | |
---|---|
time_step
|
TimeStep named tuple with final observation, reward, etc. |
policy_state
|
Tensor with final step policy state. |