A driver that takes N steps in an environment using a tf.while_loop.
Inherits From: Driver
tf_agents.drivers.dynamic_step_driver.DynamicStepDriver(
env, policy, observers=None, transition_observers=None, num_steps=1
)
Used in the notebooks
Used in the tutorials |
---|
The while loop will run num_steps in the environment, only counting steps that result in an environment transition, i.e. (time_step, action, next_time_step). If a step results in environment resetting, i.e. time_step.is_last() and next_time_step.is_first() (traj.is_boundary()), this is not counted toward the num_steps.
As environments run batched time_steps, the counters for all batch elements are summed, and execution stops when the total exceeds num_steps. When batch_size > 1, there is no guarantee that exactly num_steps are taken -- it may be more but never less.
This termination condition can be overridden in subclasses by implementing the self._loop_condition_fn() method.
Raises | |
---|---|
ValueError
|
If env is not a tf_environment.Base or policy is not an instance of tf_policy.TFPolicy. |
Methods
run
run(
time_step=None, policy_state=None, maximum_iterations=None
)
Takes steps in the environment using the policy while updating observers.
Args | |
---|---|
time_step
|
optional initial time_step. If None, it will use the current_time_step of the environment. Elements should be shape [batch_size, ...]. |
policy_state
|
optional initial state for the policy. |
maximum_iterations
|
Optional maximum number of iterations of the while loop to run. If provided, the cond output is AND-ed with an additional condition ensuring the number of iterations executed is no greater than maximum_iterations. |
Returns | |
---|---|
time_step
|
TimeStep named tuple with final observation, reward, etc. |
policy_state
|
Tensor with final step policy state. |