![]() |
Reverb trajectory sequence observer.
Inherits From: ReverbAddTrajectoryObserver
tf_agents.replay_buffers.reverb_utils.ReverbTrajectorySequenceObserver(
py_client: tf_agents.typing.types.ReverbClient
,
table_name: Union[Text, Sequence[Text]],
sequence_length: int,
stride_length: int = 1,
priority: Union[float, int] = 1,
pad_end_of_episodes: bool = False,
tile_end_of_episodes: bool = False
)
This is equivalent to ReverbAddTrajectoryObserver but sequences are not cut when a boundary trajectory is seen. This allows for sequences to be sampled with boundaries anywhere in the sequence rather than just at the end.
Consider using this observer when you want to create training experience that can encompass any subsequence of the observed trajectories.
Args | |
---|---|
py_client
|
Python client for the reverb replay server. |
table_name
|
The table name(s) where samples will be written to. |
sequence_length
|
The sequence_length used to write to the given table. |
stride_length
|
The integer stride for the sliding window for overlapping
sequences. The default value of 1 creates an item for every window.
Using L = sequence_length this means items are created for times {0,
1, .., L-1}, {1, 2, .., L}, ... . In contrast, stride_length = L will
create an item only for disjoint windows {0, 1, ..., L-1}, {L, ..., 2 *
L - 1}, ... .
|
priority
|
Initial priority for new samples in the RB. |
pad_end_of_episodes
|
At the end of an episode, the cache is dropped by
default. When pad_end_of_episodes = True , the cache gets padded with
boundary steps (last->first) with 0 values everywhere and padded items
of sequence_length are written to Reverb.
|
tile_end_of_episodes
|
If pad_end_of_episodes is True then, the last
padded item starts with a boundary step from the episode.
When this option is True the following items will be generated: F, M, L, P M, L, P, P L, P, P, P If False, only a single one will be generated: F, M, L, P For training recurrent models on environments where required information
is only available at the start of the episode it is useful to set
|
Raises | |
---|---|
ValueError
|
If tile_end_of_episodes is set without
pad_end_of_episodes .
|
Methods
close
close() -> None
Closes the writer of the observer.
flush
flush()
Ensures that items are pushed to the service.
open
open() -> None
Open the writer of the observer.
reset
reset(
write_cached_steps: bool = True
) -> None
Resets the state of the observer.
Args | |
---|---|
write_cached_steps
|
boolean flag indicating whether we want to write the cached trajectory. When this argument is True, the function attempts to write the cached data before resetting (optionally with padding). Otherwise, the cached data gets dropped. |
__call__
__call__(
trajectory: tf_agents.trajectories.Trajectory
) -> None
Writes the trajectory into the underlying replay buffer.
Allows trajectory to be a flattened trajectory. No batch dimension allowed.
Args | |
---|---|
trajectory
|
The trajectory to be written which could be (possibly nested) trajectory object or a flattened version of a trajectory. It assumes there is no batch dimension. |