- Description:
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.
The datasets follow the RLDS format to represent steps and episodes.
DeepMind Lab dataset has several levels from the challenging, partially observable Deepmind Lab suite. DeepMind Lab dataset is collected by training distributed R2D2 by Kapturowski et al., 2018 agents from scratch on individual tasks. We recorded the experience across all actors during entire training runs a few times for every task. The details of the dataset generation process is described in Gulcehre et al., 2021.
We release datasets for five different DeepMind Lab levels:
seekavoid_arena_01, explore_rewards_few, explore_rewards_many,
rooms_watermaze, rooms_select_nonmatching_object. We also release the
snapshot datasets for seekavoid_arena_01 level that we generated the datasets
from a trained R2D2 snapshot with different levels of epsilons for the
epsilon-greedy algorithm when evaluating the agent in the environment.
DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you are interested in large-scale offline RL models with memory.
- Homepage: https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged 
- Source code: - tfds.rl_unplugged.rlu_dmlab_rooms_watermaze.RluDmlabRoomsWatermaze
- Versions: - 1.0.0: Initial release.
- 1.1.0: Added is_last.
- 1.2.0(default): BGR -> RGB fix for pixel observations.
 
- Download size: - Unknown size
- Auto-cached (documentation): No 
- Feature structure: 
FeaturesDict({
    'episode_id': int64,
    'episode_return': float32,
    'steps': Dataset({
        'action': int64,
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'last_action': int64,
            'last_reward': float32,
            'pixels': Image(shape=(72, 96, 3), dtype=uint8),
        }),
        'reward': float32,
    }),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description | 
|---|---|---|---|---|
| FeaturesDict | ||||
| episode_id | Tensor | int64 | ||
| episode_return | Tensor | float32 | ||
| steps | Dataset | |||
| steps/action | Tensor | int64 | ||
| steps/discount | Tensor | float32 | ||
| steps/is_first | Tensor | bool | ||
| steps/is_last | Tensor | bool | ||
| steps/is_terminal | Tensor | bool | ||
| steps/observation | FeaturesDict | |||
| steps/observation/last_action | Tensor | int64 | ||
| steps/observation/last_reward | Tensor | float32 | ||
| steps/observation/pixels | Image | (72, 96, 3) | uint8 | |
| steps/reward | Tensor | float32 | 
- Supervised keys (See - as_superviseddoc):- None
- Figure (tfds.show_examples): Not supported. 
- Citation: 
@article{gulcehre2021rbve,
    title={Regularized Behavior Value Estimation},
    author={ {\c{C} }aglar G{\"{u} }l{\c{c} }ehre and
               Sergio G{\'{o} }mez Colmenarejo and
               Ziyu Wang and
               Jakub Sygnowski and
               Thomas Paine and
               Konrad Zolna and
               Yutian Chen and
               Matthew W. Hoffman and
               Razvan Pascanu and
               Nando de Freitas},
    year={2021},
    journal   = {CoRR},
    url       = {https://arxiv.org/abs/2103.09575},
    eprint={2103.09575},
    archivePrefix={arXiv},
}
rlu_dmlab_rooms_watermaze/training_0 (default config)
- Dataset size: - 894.50 GiB
- Splits: 
| Split | Examples | 
|---|---|
| 'train' | 67,876 | 
- Examples (tfds.as_dataframe):
rlu_dmlab_rooms_watermaze/training_1
- Dataset size: - 898.74 GiB
- Splits: 
| Split | Examples | 
|---|---|
| 'train' | 66,922 | 
- Examples (tfds.as_dataframe):
rlu_dmlab_rooms_watermaze/training_2
- Dataset size: - 825.49 GiB
- Splits: 
| Split | Examples | 
|---|---|
| 'train' | 67,081 | 
- Examples (tfds.as_dataframe):