This is equivalent to ReverbAddTrajectoryObserver but sequences are not cut
when a boundary trajectory is seen. This allows for sequences to be sampled
with boundaries anywhere in the sequence rather than just at the end.
Consider using this observer when you want to create training experience that
can encompass any subsequence of the observed trajectories.
Args
py_client
Python client for the reverb replay server.
table_name
The table name(s) where samples will be written to.
sequence_length
The sequence_length used to write to the given table.
stride_length
The integer stride for the sliding window for overlapping
sequences. The default value of 1 creates an item for every window.
Using L = sequence_length this means items are created for times {0,
1, .., L-1}, {1, 2, .., L}, .... In contrast, stride_length = L will
create an item only for disjoint windows {0, 1, ..., L-1}, {L, ..., 2 *
L - 1}, ....
priority
Initial priority for new samples in the RB.
pad_end_of_episodes
At the end of an episode, the cache is dropped by
default. When pad_end_of_episodes = True, the cache gets padded with
boundary steps (last->first) with 0 values everywhere and padded items
of sequence_length are written to Reverb.
tile_end_of_episodes
If pad_end_of_episodes is True then, the last
padded item starts with a boundary step from the episode. When this
option is True the following items will be generated: F, M, L, P M, L,
P, P L, P, P, P If False, only a single one will be generated: F, M,
L, P For training recurrent models on environments where required
information is only available at the start of the episode it is useful
to set tile_end_of_episodes=False and the sequence_length to be the
length of the longest episode.
Raises
ValueError
If tile_end_of_episodes is set without
pad_end_of_episodes.
boolean flag indicating whether we want to write the
cached trajectory. When this argument is True, the function attempts to
write the cached data before resetting (optionally with padding).
Otherwise, the cached data gets dropped.
Writes the trajectory into the underlying replay buffer.
Allows trajectory to be a flattened trajectory. No batch dimension allowed.
Args
trajectory
The trajectory to be written which could be (possibly nested)
trajectory object or a flattened version of a trajectory. It assumes
there is no batch dimension.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.replay_buffers.reverb_utils.ReverbTrajectorySequenceObserver\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/replay_buffers/reverb_utils.py#L509-L540) |\n\nReverb trajectory sequence observer.\n\nInherits From: [`ReverbAddTrajectoryObserver`](../../../tf_agents/replay_buffers/ReverbAddTrajectoryObserver) \n\n tf_agents.replay_buffers.reverb_utils.ReverbTrajectorySequenceObserver(\n py_client: ../../../tf_agents/typing/types#ReverbClient,\n table_name: Union[Text, Sequence[Text]],\n sequence_length: int,\n stride_length: int = 1,\n priority: Union[float, int] = 1,\n pad_end_of_episodes: bool = False,\n tile_end_of_episodes: bool = False\n )\n\nThis is equivalent to ReverbAddTrajectoryObserver but sequences are not cut\nwhen a boundary trajectory is seen. This allows for sequences to be sampled\nwith boundaries anywhere in the sequence rather than just at the end.\n\nConsider using this observer when you want to create training experience that\ncan encompass any subsequence of the observed trajectories.\n| **Note:** Counting of steps in drivers does not include boundary steps. To guarantee only 1 item is pushed to the replay when collecting n steps with a `sequence_length` of n make sure to set the `stride_length`\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `py_client` | Python client for the reverb replay server. |\n| `table_name` | The table name(s) where samples will be written to. |\n| `sequence_length` | The sequence_length used to write to the given table. |\n| `stride_length` | The integer stride for the sliding window for overlapping sequences. The default value of `1` creates an item for every window. Using `L = sequence_length` this means items are created for times `{0, 1, .., L-1}, {1, 2, .., L}, ...`. In contrast, `stride_length = L` will create an item only for disjoint windows `{0, 1, ..., L-1}, {L, ..., 2 * L - 1}, ...`. |\n| `priority` | Initial priority for new samples in the RB. |\n| `pad_end_of_episodes` | At the end of an episode, the cache is dropped by default. When `pad_end_of_episodes = True`, the cache gets padded with boundary steps (last-\\\u003efirst) with `0` values everywhere and padded items of `sequence_length` are written to Reverb. |\n| `tile_end_of_episodes` | If `pad_end_of_episodes` is True then, the last padded item starts with a boundary step from the episode. When this option is True the following items will be generated: F, M, L, P M, L, P, P L, P, P, P If False, only a single one will be generated: F, M, L, P For training recurrent models on environments where required information is only available at the start of the episode it is useful to set `tile_end_of_episodes=False` and the `sequence_length` to be the length of the longest episode. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------------------------------------|\n| `ValueError` | If `tile_end_of_episodes` is set without `pad_end_of_episodes`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|-------------|---------------|\n| `py_client` | \u003cbr /\u003e \u003cbr /\u003e |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `close`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/replay_buffers/reverb_utils.py#L497-L506) \n\n close() -\u003e None\n\nCloses the writer of the observer.\n| **Note:** Using the observer after it is closed (and not reopened) is not supported.\n\n### `flush`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/replay_buffers/reverb_utils.py#L419-L427) \n\n flush()\n\nEnsures that items are pushed to the service.\n| **Note:** The items are not always immediately pushed. This method is often needed when `rate_limiter_timeout_ms` is set for the replay buffer. By calling this method before the `learner.run()`, we ensure that there is enough data to be consumed.\n\n### `get_table_signature`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/replay_buffers/reverb_utils.py#L366-L369) \n\n get_table_signature()\n\n### `open`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/replay_buffers/reverb_utils.py#L489-L495) \n\n open() -\u003e None\n\nOpen the writer of the observer.\n\n### `reset`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/replay_buffers/reverb_utils.py#L442-L487) \n\n reset(\n write_cached_steps: bool = True\n ) -\u003e None\n\nResets the state of the observer.\n| **Note:** Reset should be called only after all collection has finished in a standard workflow. No need to manually call reset between episodes.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `write_cached_steps` | boolean flag indicating whether we want to write the cached trajectory. When this argument is True, the function attempts to write the cached data before resetting (optionally with padding). Otherwise, the cached data gets dropped. |\n\n\u003cbr /\u003e\n\n### `__call__`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/replay_buffers/reverb_utils.py#L524-L540) \n\n __call__(\n trajectory: ../../../tf_agents/trajectories/Trajectory\n ) -\u003e None\n\nWrites the trajectory into the underlying replay buffer.\n\nAllows trajectory to be a flattened trajectory. No batch dimension allowed.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `trajectory` | The trajectory to be written which could be (possibly nested) trajectory object or a flattened version of a trajectory. It assumes there is *no* batch dimension. |\n\n\u003cbr /\u003e"]]