tf_agents.replay_buffers.ReverbAddTrajectoryObserver

Stateful observer for writing fixed length trajectories to Reverb.

View aliases

Main aliases

tf_agents.replay_buffers.reverb_utils.ReverbAddTrajectoryObserver

tf_agents.replay_buffers.ReverbAddTrajectoryObserver(
    py_client: tf_agents.typing.types.ReverbClient,
    table_name: Union[Text, Sequence[Text]],
    sequence_length: int,
    stride_length: int = 1,
    priority: Union[float, int] = 1,
    pad_end_of_episodes: bool = False,
    tile_end_of_episodes: bool = False
)

Used in the notebooks

Used in the tutorials
Train a Deep Q Network with TF-Agents SAC minitaur with the Actor-Learner API

This observer should be called at every environment step. It does not support batched trajectories.

Steps are cached until sequence_length steps are gathered. At which point an item is created. From there on a new item is created every stride_length observer calls.

If an episode terminates before enough steps are cached, the data is discarded unless pad_end_of_episodes is set.

Args
`py_client`	Python client for the reverb replay server.
`table_name`	The table name(s) where samples will be written to.
`sequence_length`	The sequence_length used to write to the given table.
`stride_length`	The integer stride for the sliding window for overlapping sequences. The default value of `1` creates an item for every window. Using `L = sequence_length` this means items are created for times `{0, 1, .., L-1}, {1, 2, .., L}, ...`. In contrast, `stride_length = L` will create an item only for disjoint windows `{0, 1, ..., L-1}, {L, ..., 2 * L - 1}, ...`.
`priority`	Initial priority for new samples in the RB.
`pad_end_of_episodes`	At the end of an episode, the cache is dropped by default. When `pad_end_of_episodes = True`, the cache gets padded with boundary steps (last->first) with `0` values everywhere and padded items of `sequence_length` are written to Reverb.
`tile_end_of_episodes`	If `pad_end_of_episodes` is True then, the last padded item starts with a boundary step from the episode. When this option is True the following items will be generated: F, M, L, P M, L, P, P L, P, P, P If False, only a single one will be generated: F, M, L, P For training recurrent models on environments where required information is only available at the start of the episode it is useful to set `tile_end_of_episodes=False` and the `sequence_length` to be the length of the longest episode.

Raises
`ValueError`	If `tile_end_of_episodes` is set without `pad_end_of_episodes`.

Attributes
`py_client`

Attributes

py_client

Methods

`close`

View source

close() -> None

Closes the writer of the observer.

`flush`

View source

flush()

Ensures that items are pushed to the service.

`get_table_signature`

View source

get_table_signature()

`open`

View source

open() -> None

Open the writer of the observer.

`reset`

View source

reset(
    write_cached_steps: bool = True
) -> None

Resets the state of the observer.

Args
`write_cached_steps`	boolean flag indicating whether we want to write the cached trajectory. When this argument is True, the function attempts to write the cached data before resetting (optionally with padding). Otherwise, the cached data gets dropped.

`call`

View source

__call__(
    trajectory: tf_agents.trajectories.Trajectory
) -> None

Writes the trajectory into the underlying replay buffer.

Allows trajectory to be a flattened trajectory. No batch dimension allowed.

Args
`trajectory`	The trajectory to be written which could be (possibly nested) trajectory object or a flattened version of a trajectory. It assumes there is no batch dimension.

tf_agents.replay_buffers.ReverbAddTrajectoryObserver

View aliases

Used in the notebooks

Args

Raises

Attributes

Methods

close

flush

get_table_signature

open

reset

__call__

`close`

`flush`

`get_table_signature`

`open`

`reset`

`call`