|View source on GitHub|
Stateful observer for writing fixed length trajectories to Reverb.
tf_agents.typing.types.ReverbClient, table_name: Union[Text, Sequence[Text]], sequence_length: int, stride_length: int = 1, priority: Union[float, int] = 1, pad_end_of_episodes: bool = False, tile_end_of_episodes: bool = False )
Used in the notebooks
|Used in the tutorials|
This observer should be called at every environment step. It does not support batched trajectories.
Steps are cached until
sequence_length steps are gathered. At which point an
item is created. From there on a new item is created every
If an episode terminates before enough steps are cached, the data is discarded
pad_end_of_episodes is set.
||Python client for the reverb replay server.|
||The table name(s) where samples will be written to.|
||The sequence_length used to write to the given table.|
The integer stride for the sliding window for overlapping
sequences. The default value of
||Initial priority for new samples in the RB.|
At the end of an episode, the cache is dropped by
When this option is True the following items will be generated:
F, M, L, P M, L, P, P L, P, P, P
If False, only a single one will be generated:
F, M, L, P
For training recurrent models on environments where required information
is only available at the start of the episode it is useful to set
close() -> None
Closes the writer of the observer.
Ensures that items are pushed to the service.
open() -> None
Open the writer of the observer.
reset( write_cached_steps: bool = True ) -> None
Resets the state of the observer.
||boolean flag indicating whether we want to write the cached trajectory. When this argument is True, the function attempts to write the cached data before resetting (optionally with padding). Otherwise, the cached data gets dropped.|
tf_agents.trajectories.Trajectory) -> None
Writes the trajectory into the underlying replay buffer.
Allows trajectory to be a flattened trajectory. No batch dimension allowed.
||The trajectory to be written which could be (possibly nested) trajectory object or a flattened version of a trajectory. It assumes there is no batch dimension.|