RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.

The datasets follow the RLDS format to represent steps and episodes.

DeepMind Lab dataset has several levels from the challenging, partially observable Deepmind Lab suite. DeepMind Lab dataset is collected by training distributed R2D2 by Kapturowski et al., 2018 agents from scratch on individual tasks. We recorded the experience across all actors during entire training runs a few times for every task. The details of the dataset generation process is described in Gulcehre et al., 2021.

We release datasets for five different DeepMind Lab levels: seekavoid_arena_01, explore_rewards_few, explore_rewards_many, rooms_watermaze, rooms_select_nonmatching_object. We also release the snapshot datasets for seekavoid_arena_01 level that we generated the datasets from a trained R2D2 snapshot with different levels of epsilons for the epsilon-greedy algorithm when evaluating the agent in the environment.

DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you are interested in large-scale offline RL models with memory.

Split Examples
'train' 89,144
  • Feature structure:
    'episode_id': int64,
    'episode_return': float32,
    'steps': Dataset({
        'action': int64,
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'last_action': int64,
            'last_reward': float32,
            'pixels': Image(shape=(72, 96, 3), dtype=uint8),
        'reward': float32,
  • Feature documentation:
Feature Class Shape Dtype Description
episode_id Tensor int64
episode_return Tensor float32
steps Dataset
steps/action Tensor int64
steps/discount Tensor float32
steps/is_first Tensor bool
steps/is_last Tensor bool
steps/is_terminal Tensor bool
steps/observation FeaturesDict
steps/observation/last_action Tensor int64
steps/observation/last_reward Tensor float32
steps/observation/pixels Image (72, 96, 3) uint8
steps/reward Tensor float32
    title={Regularized Behavior Value Estimation},
    author={ {\c{C} }aglar G{\"{u} }l{\c{c} }ehre and
               Sergio G{\'{o} }mez Colmenarejo and
               Ziyu Wang and
               Jakub Sygnowski and
               Thomas Paine and
               Konrad Zolna and
               Yutian Chen and
               Matthew W. Hoffman and
               Razvan Pascanu and
               Nando de Freitas},
    journal   = {CoRR},
    url       = {https://arxiv.org/abs/2103.09575},

