tf_agents.replay_buffers.TFUniformReplayBuffer

A TFUniformReplayBuffer with batched adds and uniform sampling.

Inherits From: ReplayBuffer

Used in the notebooks

Used in the tutorials

data_spec A TensorSpec or a list/tuple/nest of TensorSpecs describing a single item that can be stored in this buffer.
batch_size Batch dimension of tensors when adding to buffer.
max_length The maximum number of items that can be stored in a single batch segment of the buffer.
scope Scope prefix for variables and ops created by this class.
device A TensorFlow device to place the Variables and ops.
table_fn Function to create tables table_fn(data_spec, capacity) that can read/write nested tensors.
dataset_drop_remainder If True, then when calling as_dataset with arguments single_deterministic_pass=True and sample_batch_size is not None, the final batch will be dropped if it does not contain exactly sample_batch_size items. This is helpful for static shape inference as the resulting tensors will always have leading dimension sample_batch_size instead of None.
dataset_window_shift Window shift used when calling as_dataset with arguments single_deterministic_pass=True and num_steps is not None. This determines how the resulting frames are windowed. If None, then there is no overlap created between frames and each frame is seen exactly once. For example, if max_length=5, num_steps=2, sample_batch_size=None, and dataset_window_shift=None, then the datasets returned will have frames {[0, 1], [2, 3], [4]}. If dataset_window_shift is not None, then windows are created with a window overlap of dataset_window_shift and you will see each frame up to num_steps times. For example, if max_length=5, num_steps=2, sample_batch_size=None, and dataset_window_shift=1, then the datasets returned will have windows of shifted repeated frames: {[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]}. For more details, see the documentation of tf.data.Dataset.window, specifically for the shift argument. The default behavior is to not overlap frames (dataset_window_shift=None) but users often want to see all combinations of frame sequences, in which case dataset_window_shift=1 is the appropriate value.
stateful_dataset whether the dataset contains stateful ops or not.

capacity Returns the capacity of the replay buffer.
data_spec Returns the spec for items in the replay buffer.
device

scope

stateful_dataset Returns whether the dataset of the replay buffer has stateful ops.
table_fn

Methods

add_batch

View source

Adds a batch of items to the replay buffer.

Args
items An item or list/tuple/nest of items to be added to the replay buffer. items must match the data_spec of this class, with a batch_size dimension added to the beginning of each tensor/array.

Returns
Adds items to the replay buffer.

as_dataset

View source

Creates and returns a dataset that returns entries from the buffer.

A single entry from the dataset is the result of the following pipeline:

  • Sample sequences from the underlying data store
  • (optionally) Process them with sequence_preprocess_fn,
  • (optionally) Split them into subsequences of length num_steps
  • (optionally) Batch them into batches of size sample_batch_size.

In practice, this pipeline is executed in parallel as much as possible if num_parallel_calls != 1.

Some additional notes:

If num_steps is None, different replay buffers will behave differently. For example, TFUniformReplayBuffer will return single time steps without a time dimension. In contrast, e.g., EpisodicReplayBuffer will return full sequences (since each sequence may be an episode of unknown length, the outermost shape dimension will be None).

If sample_batch_size is None, no batching is performed; and there is no outer batch dimension in the returned Dataset entries. This setting is useful with variable episode lengths using e.g. EpisodicReplayBuffer, because it allows the user to get full episodes back, and use tf.data to build padded or truncated batches themselves.

If single_deterministic_pass == True, the replay buffer will make every attempt to ensure every time step is visited once and exactly once in a deterministic manner (though true determinism depends on the underlying data store). Additional work may be done to ensure minibatches do not have multiple rows from the same episode. In some cases, this may mean arguments like num_parallel_calls are ignored.

Args
sample_batch_size (Optional.) An optional batch_size to specify the number of items to return. If None (default), a single item is returned which matches the data_spec of this class (without a batch dimension). Otherwise, a batch of sample_batch_size items is returned, where each tensor in items will have its first dimension equal to sample_batch_size and the rest of the dimensions match the corresponding data_spec.
num_steps (Optional.) Optional way to specify that sub-episodes are desired. If None (default), a batch of single items is returned. Otherwise, a batch of sub-episodes is returned, where a sub-episode is a sequence of consecutive items in the replay_buffer. The returned tensors will have first dimension equal to sample_batch_size (if sample_batch_size is not None), subsequent dimension equal to num_steps, and remaining dimensions which match the data_spec of this class.
num_parallel_calls (Optional.) A tf.int32 scalar tf.Tensor, representing the number elements to process in parallel. If not specified, elements will be processed sequentially.
sequence_preprocess_fn (Optional) fn for preprocessing the collected data before it is split into subsequences of length num_steps. Defined in TFAgent.preprocess_sequence. Defaults to pass through.
single_deterministic_pass Python boolean. If True, the dataset will return a single deterministic pass through its underlying data. NOTE: If the buffer is modified while a Dataset iterator is iterating over this data, the iterator may miss any new data or otherwise have subtly invalid data.

Returns
A dataset of type tf.data.Dataset, elements of which are 2-tuples of:

  • An item or sequence of items or batch thereof
  • Auxiliary info for the items (i.e. ids, probs).

Raises
NotImplementedError If a non-default argument value is not supported.
ValueError If the data spec contains lists that must be converted to tuples.

clear

View source

Resets the contents of replay buffer.

Returns
Clears the replay buffer contents.

gather_all

View source

Returns all the items in buffer. (deprecated)

Returns
Returns all the items currently in the buffer. Returns a tensor of shape [B, T, ...] where B = batch size, T = timesteps, and the remaining shape is the shape spec of the items in the buffer.

get_next

View source

Returns an item or batch of items from the buffer. (deprecated)

Args
sample_batch_size (Optional.) An optional batch_size to specify the number of items to return. If None (default), a single item is returned which matches the data_spec of this class (without a batch dimension). Otherwise, a batch of sample_batch_size items is returned, where each tensor in items will have its first dimension equal to sample_batch_size and the rest of the dimensions match the corresponding data_spec. See examples below.
num_steps (Optional.) Optional way to specify that sub-episodes are desired. If None (default), in non-episodic replay buffers, a batch of single items is returned. In episodic buffers, full episodes are returned (note that sample_batch_size must be None in that case). Otherwise, a batch of sub-episodes is returned, where a sub-episode is a sequence of consecutive items in the replay_buffer. The returned tensors will have first dimension equal to sample_batch_size (if sample_batch_size is not None), subsequent dimension equal to num_steps, if time_stacked=True and remaining dimensions which match the data_spec of this class. See examples below.
time_stacked (Optional.) Boolean, when true and num_steps > 1 it returns the items stacked on the time dimension. See examples below for details. Examples of tensor shapes returned: (B = batch size, T = timestep, D = data spec) get_next(sample_batch_size=None, num_steps=None, time_stacked=True) return shape (non-episodic): [D] return shape (episodic): T, D get_next(sample_batch_size=B, num_steps=None, time_stacked=True) return shape (non-episodic): [B, D] return shape (episodic): Not supported get_next(sample_batch_size=B, num_steps=T, time_stacked=True) return shape: [B, T, D] get_next(sample_batch_size=None, num_steps=T, time_stacked=False) return shape: ([D], [D], ..) T tensors in the tuple get_next(sample_batch_size=B, num_steps=T, time_stacked=False) return shape: ([B, D], [B, D], ..) T tensors in the tuple

Returns
A 2-tuple containing:

  • An item or sequence of (optionally batched and stacked) items.
  • Auxiliary info for the items (i.e. ids, probs).

num_frames

View source

Returns the number of frames in the replay buffer.