tf_agents.drivers.dynamic_step_driver.DynamicStepDriver
Stay organized with collections
Save and categorize content based on your preferences.
A driver that takes N steps in an environment using a tf.while_loop.
Inherits From: Driver
tf_agents.drivers.dynamic_step_driver.DynamicStepDriver(
env, policy, observers=None, transition_observers=None, num_steps=1
)
Used in the notebooks
The while loop will run num_steps in the environment, only counting steps that
result in an environment transition, i.e. (time_step, action, next_time_step).
If a step results in environment resetting, i.e. time_step.is_last() and
next_time_step.is_first() (traj.is_boundary()), this is not counted toward the
num_steps.
As environments run batched time_steps, the counters for all batch elements
are summed, and execution stops when the total exceeds num_steps. When
batch_size > 1, there is no guarantee that exactly num_steps are taken -- it
may be more but never less.
This termination condition can be overridden in subclasses by implementing the
self._loop_condition_fn() method.
Args |
env
|
A tf_environment.Base environment.
|
policy
|
A tf_policy.TFPolicy policy.
|
observers
|
A list of observers that are updated after every step in the
environment. Each observer is a callable(time_step.Trajectory).
|
transition_observers
|
A list of observers that are updated after every
step in the environment. Each observer is a callable((TimeStep,
PolicyStep, NextTimeStep)).
|
num_steps
|
The number of steps to take in the environment. For batched or
parallel environments, this is the total number of steps taken summed
across all environments.
|
Raises |
ValueError
|
If env is not a tf_environment.Base or policy is not an instance of
tf_policy.TFPolicy.
|
Attributes |
env
|
|
info_observers
|
|
observers
|
|
policy
|
|
transition_observers
|
|
Methods
run
View source
run(
time_step=None, policy_state=None, maximum_iterations=None
)
Takes steps in the environment using the policy while updating observers.
Args |
time_step
|
optional initial time_step. If None, it will use the
current_time_step of the environment. Elements should be shape
[batch_size, ...].
|
policy_state
|
optional initial state for the policy.
|
maximum_iterations
|
Optional maximum number of iterations of the while
loop to run. If provided, the cond output is AND-ed with an additional
condition ensuring the number of iterations executed is no greater than
maximum_iterations.
|
Returns |
time_step
|
TimeStep named tuple with final observation, reward, etc.
|
policy_state
|
Tensor with final step policy state.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.drivers.dynamic_step_driver.DynamicStepDriver\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/drivers/dynamic_step_driver.py#L47-L224) |\n\nA driver that takes N steps in an environment using a tf.while_loop.\n\nInherits From: [`Driver`](../../../tf_agents/drivers/driver/Driver) \n\n tf_agents.drivers.dynamic_step_driver.DynamicStepDriver(\n env, policy, observers=None, transition_observers=None, num_steps=1\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [Checkpointer and PolicySaver](https://www.tensorflow.org/agents/tutorials/10_checkpointer_policysaver_tutorial) - [Replay Buffers](https://www.tensorflow.org/agents/tutorials/5_replay_buffers_tutorial) - [Tutorial on Multi Armed Bandits in TF-Agents](https://www.tensorflow.org/agents/tutorials/bandits_tutorial) - [A Tutorial on Multi-Armed Bandits with Per-Arm Features](https://www.tensorflow.org/agents/tutorials/per_arm_bandits_tutorial) - [Tutorial on Ranking in TF-Agents](https://www.tensorflow.org/agents/tutorials/ranking_tutorial) |\n\nThe while loop will run num_steps in the environment, only counting steps that\nresult in an environment transition, i.e. (time_step, action, next_time_step).\nIf a step results in environment resetting, i.e. time_step.is_last() and\nnext_time_step.is_first() (traj.is_boundary()), this is not counted toward the\nnum_steps.\n\nAs environments run batched time_steps, the counters for all batch elements\nare summed, and execution stops when the total exceeds num_steps. When\nbatch_size \\\u003e 1, there is no guarantee that exactly num_steps are taken -- it\nmay be more but never less.\n\nThis termination condition can be overridden in subclasses by implementing the\nself._loop_condition_fn() method.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `env` | A tf_environment.Base environment. |\n| `policy` | A tf_policy.TFPolicy policy. |\n| `observers` | A list of observers that are updated after every step in the environment. Each observer is a callable(time_step.Trajectory). |\n| `transition_observers` | A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). |\n| `num_steps` | The number of steps to take in the environment. For batched or parallel environments, this is the total number of steps taken summed across all environments. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------------------------------------------------------------|\n| `ValueError` | If env is not a tf_environment.Base or policy is not an instance of tf_policy.TFPolicy. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|------------------------|---------------|\n| `env` | \u003cbr /\u003e \u003cbr /\u003e |\n| `info_observers` | \u003cbr /\u003e \u003cbr /\u003e |\n| `observers` | \u003cbr /\u003e \u003cbr /\u003e |\n| `policy` | \u003cbr /\u003e \u003cbr /\u003e |\n| `transition_observers` | \u003cbr /\u003e \u003cbr /\u003e |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `run`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/drivers/dynamic_step_driver.py#L176-L197) \n\n run(\n time_step=None, policy_state=None, maximum_iterations=None\n )\n\nTakes steps in the environment using the policy while updating observers.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `time_step` | optional initial time_step. If None, it will use the current_time_step of the environment. Elements should be shape \\[batch_size, ...\\]. |\n| `policy_state` | optional initial state for the policy. |\n| `maximum_iterations` | Optional maximum number of iterations of the while loop to run. If provided, the cond output is AND-ed with an additional condition ensuring the number of iterations executed is no greater than maximum_iterations. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|----------------|-----------------------------------------------------------|\n| `time_step` | TimeStep named tuple with final observation, reward, etc. |\n| `policy_state` | Tensor with final step policy state. |\n\n\u003cbr /\u003e"]]