The while loop will run num_episodes in the environment, counting transitions
that result in ending an episode.
As environments run batched time_episodes, the counters for all batch elements
are summed, and execution stops when the total exceeds num_episodes.
This termination condition can be overridden in subclasses by implementing the
self._loop_condition_fn() method.
Args
env
A tf_environment.Base environment.
policy
A tf_policy.TFPolicy policy.
observers
A list of observers that are updated after every step in the
environment. Each observer is a callable(Trajectory).
transition_observers
A list of observers that are updated after every
step in the environment. Each observer is a callable((TimeStep,
PolicyStep, NextTimeStep)).
num_episodes
The number of episodes to take in the environment. For
batched or parallel environments, this is the total number of episodes
summed across all environments.
Raises
ValueError
If env is not a tf_environment.Base or policy is not an instance of
tf_policy.TFPolicy.
Takes episodes in the environment using the policy and update observers.
If time_step and policy_state are not provided, run will reset the
environment and request an initial state from the policy.
Args
time_step
optional initial time_step. If None, it will be obtained by
resetting the environment. Elements should be shape [batch_size, ...].
policy_state
optional initial state for the policy. If None, it will be
obtained from the policy.get_initial_state().
num_episodes
Optional number of episodes to take in the environment. If
None it would use initial num_episodes.
maximum_iterations
Optional maximum number of iterations of the while
loop to run. If provided, the cond output is AND-ed with an additional
condition ensuring the number of iterations executed is no greater than
maximum_iterations.
Returns
time_step
TimeStep named tuple with final observation, reward, etc.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.drivers.dynamic_episode_driver.DynamicEpisodeDriver\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/drivers/dynamic_episode_driver.py#L44-L259) |\n\nA driver that takes N episodes in an environment using a tf.while_loop.\n\nInherits From: [`Driver`](../../../tf_agents/drivers/driver/Driver) \n\n tf_agents.drivers.dynamic_episode_driver.DynamicEpisodeDriver(\n env, policy, observers=None, transition_observers=None, num_episodes=1\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|-----------------------------------------------------------------------------|\n| - [Drivers](https://www.tensorflow.org/agents/tutorials/4_drivers_tutorial) |\n\nThe while loop will run num_episodes in the environment, counting transitions\nthat result in ending an episode.\n\nAs environments run batched time_episodes, the counters for all batch elements\nare summed, and execution stops when the total exceeds num_episodes.\n\nThis termination condition can be overridden in subclasses by implementing the\nself._loop_condition_fn() method.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `env` | A tf_environment.Base environment. |\n| `policy` | A tf_policy.TFPolicy policy. |\n| `observers` | A list of observers that are updated after every step in the environment. Each observer is a callable(Trajectory). |\n| `transition_observers` | A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). |\n| `num_episodes` | The number of episodes to take in the environment. For batched or parallel environments, this is the total number of episodes summed across all environments. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------------------------------------------------------------|\n| `ValueError` | If env is not a tf_environment.Base or policy is not an instance of tf_policy.TFPolicy. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|------------------------|---------------|\n| `env` | \u003cbr /\u003e \u003cbr /\u003e |\n| `info_observers` | \u003cbr /\u003e \u003cbr /\u003e |\n| `observers` | \u003cbr /\u003e \u003cbr /\u003e |\n| `policy` | \u003cbr /\u003e \u003cbr /\u003e |\n| `transition_observers` | \u003cbr /\u003e \u003cbr /\u003e |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `run`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/drivers/dynamic_episode_driver.py#L180-L223) \n\n run(\n time_step=None,\n policy_state=None,\n num_episodes=None,\n maximum_iterations=None\n )\n\nTakes episodes in the environment using the policy and update observers.\n\nIf `time_step` and `policy_state` are not provided, `run` will reset the\nenvironment and request an initial state from the policy.\n| **Note:** about bias when using batched environments with `num_episodes`: When using `num_episodes != None`, a `run` step \"finishes\" collecting `num_episodes` have been completely collected (hit a boundary). When used in conjunction with environments that have variable-length episodes, this skews the distribution of collected episodes' lengths: short episodes are seen more frequently than long ones. As a result, running an `env` of `N \u003e 1` batched environments with `num_episodes \u003e= 1` is not the same as running an env with `1` environment with `num_episodes \u003e= 1`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `time_step` | optional initial time_step. If None, it will be obtained by resetting the environment. Elements should be shape \\[batch_size, ...\\]. |\n| `policy_state` | optional initial state for the policy. If None, it will be obtained from the policy.get_initial_state(). |\n| `num_episodes` | Optional number of episodes to take in the environment. If None it would use initial num_episodes. |\n| `maximum_iterations` | Optional maximum number of iterations of the while loop to run. If provided, the cond output is AND-ed with an additional condition ensuring the number of iterations executed is no greater than maximum_iterations. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|----------------|-----------------------------------------------------------|\n| `time_step` | TimeStep named tuple with final observation, reward, etc. |\n| `policy_state` | Tensor with final step policy state. |\n\n\u003cbr /\u003e"]]