tf_agents.replay_buffers.ReverbReplayBuffer

Reverb ReplayBuffer exposed as a TF-Agents replay buffer.

Inherits From: ReplayBuffer

View aliases

Main aliases

tf_agents.replay_buffers.reverb_replay_buffer.ReverbReplayBuffer

tf_agents.replay_buffers.ReverbReplayBuffer(
    data_spec,
    table_name,
    sequence_length,
    server_address=None,
    local_server=None,
    dataset_buffer_size=None,
    max_cycle_length=32,
    num_workers_per_iterator=-1,
    max_samples_per_stream=-1,
    rate_limiter_timeout_ms=-1
)

Used in the notebooks

Used in the tutorials
Train a Deep Q Network with TF-Agents REINFORCE agent SAC minitaur with the Actor-Learner API

Args
`data_spec`	Spec for the data held in the replay buffer.
`table_name`	Name of the table that will be sampled.
`sequence_length`	(can be set to `None`, i.e unknown) The number of timesteps that each sample consists of. If not `None`, then the lengths of samples received from the server will be validated against this number. NOTE This replay buffer will be at its most performant if the `sequence_length` here is equal to `num_steps` passed to `as_dataset`, and is also used when writing to the replay buffer (for example, see the `sequence_lengths` argument of the `Reverb.*Observer` classes).
`server_address`	(Optional) Address of the reverb replay server. One of `server_address` or `local_server` must be provided.
`local_server`	(Optional) An instance of `reverb.Server` that holds the replay's data.
`dataset_buffer_size`	(Optional) This is the prefetch buffer size (in number of items) of the Reverb Dataset object. A good rule of thumb is to set this value to 2-3x times the sample_batch_size you will use.
`max_cycle_length`	(Optional) The number of sequences used to populate the batches of `as_dataset`. By default, `min(32, sample_batch_size)` is used, but the number can be between `1` and `sample_batch_size`.
`num_workers_per_iterator`	(Defaults to -1, i.e auto selected) The number of worker threads to create per dataset iterator. When the selected table uses a FIFO or Heap sampler (i.e a queue) then exactly 1 worker must be used to avoid races causing invalid ordering of items. For all other samplers, this value should be roughly equal to the number of threads available on the CPU.
`max_samples_per_stream`	(Defaults to -1, i.e auto selected) The maximum number of samples to fetch from a stream before a new call is made. Keeping this number low ensures that the data is fetched uniformly from all servers.
`rate_limiter_timeout_ms`	(Defaults to -1: infinite). Timeout (in milliseconds) to wait on the rate limiter when sampling from the table. If `rate_limiter_timeout_ms >= 0`, this is the timeout passed to `Table::Sample` describing how long to wait for the rate limiter to allow sampling.

Attributes
`capacity`	Returns the capacity of the replay buffer.
`data_spec`	Returns the spec for items in the replay buffer.
`local_server`
`py_client`
`stateful_dataset`	Returns whether the dataset of the replay buffer has stateful ops.
`tf_client`

Methods

`add_batch`

View source

add_batch(
    items
)

Adds a batch of items to the replay buffer.

Args
`items`	Ignored.

Returns
Nothing.

Raises: NotImplementedError

`as_dataset`

View source

as_dataset(
    sample_batch_size=None,
    num_steps=None,
    num_parallel_calls=None,
    sequence_preprocess_fn=None,
    single_deterministic_pass=False
)

Creates and returns a dataset that returns entries from the buffer.

A single entry from the dataset is the result of the following pipeline:

Sample sequences from the underlying data store
(optionally) Process them with sequence_preprocess_fn,
(optionally) Split them into subsequences of length num_steps
(optionally) Batch them into batches of size sample_batch_size.

In practice, this pipeline is executed in parallel as much as possible if num_parallel_calls != 1.

Some additional notes:

If num_steps is None, different replay buffers will behave differently. For example, TFUniformReplayBuffer will return single time steps without a time dimension. In contrast, e.g., EpisodicReplayBuffer will return full sequences (since each sequence may be an episode of unknown length, the outermost shape dimension will be None).

If sample_batch_size is None, no batching is performed; and there is no outer batch dimension in the returned Dataset entries. This setting is useful with variable episode lengths using e.g. EpisodicReplayBuffer, because it allows the user to get full episodes back, and use tf.data to build padded or truncated batches themselves.

If single_deterministic_pass == True, the replay buffer will make every attempt to ensure every time step is visited once and exactly once in a deterministic manner (though true determinism depends on the underlying data store). Additional work may be done to ensure minibatches do not have multiple rows from the same episode. In some cases, this may mean arguments like num_parallel_calls are ignored.

Args
`sample_batch_size`	(Optional.) An optional batch_size to specify the number of items to return. If None (default), a single item is returned which matches the data_spec of this class (without a batch dimension). Otherwise, a batch of sample_batch_size items is returned, where each tensor in items will have its first dimension equal to sample_batch_size and the rest of the dimensions match the corresponding data_spec.
`num_steps`	(Optional.) Optional way to specify that sub-episodes are desired. If None (default), a batch of single items is returned. Otherwise, a batch of sub-episodes is returned, where a sub-episode is a sequence of consecutive items in the replay_buffer. The returned tensors will have first dimension equal to sample_batch_size (if sample_batch_size is not None), subsequent dimension equal to num_steps, and remaining dimensions which match the data_spec of this class.
`num_parallel_calls`	(Optional.) A `tf.int32` scalar `tf.Tensor`, representing the number elements to process in parallel. If not specified, elements will be processed sequentially.
`sequence_preprocess_fn`	(Optional) fn for preprocessing the collected data before it is split into subsequences of length `num_steps`. Defined in `TFAgent.preprocess_sequence`. Defaults to pass through.
`single_deterministic_pass`	Python boolean. If `True`, the dataset will return a single deterministic pass through its underlying data. NOTE: If the buffer is modified while a Dataset iterator is iterating over this data, the iterator may miss any new data or otherwise have subtly invalid data.

Returns
A dataset of type tf.data.Dataset, elements of which are 2-tuples of: An item or sequence of items or batch thereof Auxiliary info for the items (i.e. ids, probs).

Returns

A dataset of type tf.data.Dataset, elements of which are 2-tuples of:

An item or sequence of items or batch thereof
Auxiliary info for the items (i.e. ids, probs).

Raises
`NotImplementedError`	If a non-default argument value is not supported.
`ValueError`	If the data spec contains lists that must be converted to tuples.

`clear`

View source

clear()

Resets the contents of replay buffer.

Returns
Clears the replay buffer contents.

`gather_all`

View source

gather_all()

Returns all the items in buffer.

Returns
Nothing.

Raises
NotImplementedError

`get_next`

View source

get_next(
    sample_batch_size=None, num_steps=None, time_stacked=True
)

Returns an item or batch of items from the buffer.

Args
`sample_batch_size`	Ignored.
`num_steps`	Ignored.
`time_stacked`	Ignored.

Returns
Nothing.

Raises
NotImplementedError

`get_table_info`

View source

get_table_info()

`num_frames`

View source

num_frames()

Returns the number of frames in the replay buffer.

`update_priorities`

View source

update_priorities(
    keys, priorities
)

Updates the priorities for the given keys.

tf_agents.replay_buffers.ReverbReplayBuffer

View aliases

Used in the notebooks

Args

Attributes

Methods

add_batch

as_dataset

Some additional notes:

clear

gather_all

get_next

get_table_info

num_frames

update_priorities

`add_batch`

`as_dataset`

`clear`

`gather_all`

`get_next`

`get_table_info`

`num_frames`

`update_priorities`