- Deskripsi :
RL Unplugged adalah rangkaian tolok ukur untuk pembelajaran penguatan offline. RL Unplugged dirancang berdasarkan pertimbangan berikut: untuk memfasilitasi kemudahan penggunaan, kami menyediakan dataset dengan API terpadu yang memudahkan praktisi untuk bekerja dengan semua data dalam suite setelah pipeline umum dibuat.
Kumpulan data mengikuti format RLDS untuk mewakili langkah dan episode.
Tugas ini terdiri dari tugas penggerak koridor yang melibatkan CMU Humanoid, yang upaya sebelumnya telah menggunakan data penangkapan gerak Merel et al., 2019a , Merel et al., 2019b atau pelatihan dari awal Song et al., 2020 . Selain itu, repositori DM Locomotion berisi serangkaian tugas yang diadaptasi agar sesuai dengan hewan pengerat virtual Merel et al., 2020 . Kami menekankan bahwa tugas DM Locomotion menampilkan kombinasi kontrol berkelanjutan DoF tinggi yang menantang bersama dengan persepsi dari pengamatan egosentris yang kaya. Untuk perincian tentang bagaimana set data dihasilkan, silakan merujuk ke makalah.
Kami menyarankan Anda untuk mencoba metode RL offline pada dataset DeepMind Locomotion, jika Anda tertarik dengan dataset RL offline yang sangat menantang dengan ruang tindakan berkelanjutan.
Beranda : https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
Kode sumber :
tfds.rl_unplugged.rlu_locomotion.RluLocomotion
Versi :
-
1.0.0
(default): Rilis awal.
-
Ukuran unduhan :
Unknown size
Di-cache otomatis ( dokumentasi ): Tidak
Kunci yang diawasi (Lihat
as_supervised
doc ):None
Gambar ( tfds.show_examples ): Tidak didukung.
Kutipan :
@inproceedings{gulcehre2020rl,
title = {RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning},
author = {Gulcehre, Caglar and Wang, Ziyu and Novikov, Alexander and Paine, Thomas and G'{o}mez, Sergio and Zolna, Konrad and Agarwal, Rishabh and Merel, Josh S and Mankowitz, Daniel J and Paduraru, Cosmin and Dulac-Arnold, Gabriel and Li, Jerry and Norouzi, Mohammad and Hoffman, Matthew and Heess, Nicolas and de Freitas, Nando},
booktitle = {Advances in Neural Information Processing Systems},
pages = {7248--7259},
volume = {33},
year = {2020}
}
rlu_locomotion/humanoid_corridor (konfigurasi default)
Ukuran dataset :
1.88 GiB
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 4.000 |
- Struktur fitur :
FeaturesDict({
'episode_id': int64,
'steps': Dataset({
'action': Tensor(shape=(56,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'walker': FeaturesDict({
'body_height': Tensor(shape=(1,), dtype=float32),
'egocentric_camera': Image(shape=(64, 64, 3), dtype=uint8),
'end_effectors_pos': Tensor(shape=(12,), dtype=float32),
'joints_pos': Tensor(shape=(56,), dtype=float32),
'joints_vel': Tensor(shape=(56,), dtype=float32),
'sensors_accelerometer': Tensor(shape=(3,), dtype=float32),
'sensors_gyro': Tensor(shape=(3,), dtype=float32),
'sensors_velocimeter': Tensor(shape=(3,), dtype=float32),
'world_zaxis': Tensor(shape=(3,), dtype=float32),
}),
}),
'reward': float32,
}),
'timestamp': int64,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
episode_id | Tensor | int64 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (56,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | fiturDict | |||
langkah/pengamatan/pejalan | fiturDict | |||
langkah/observasi/pejalan/tinggi badan | Tensor | (1,) | float32 | |
langkah/pengamatan/pejalan kaki/kamera_egosentris | Gambar | (64, 64, 3) | uint8 | |
langkah/pengamatan/pejalan/end_effectors_pos | Tensor | (12,) | float32 | |
langkah/pengamatan/pejalan kaki/joints_pos | Tensor | (56,) | float32 | |
langkah/pengamatan/pejalan kaki/sendi_vel | Tensor | (56,) | float32 | |
langkah/pengamatan/pejalan/sensors_accelerometer | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/sensor_gyro | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_velocimeter | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/world_zaxis | Tensor | (3,) | float32 | |
langkah/hadiah | Tensor | float32 | ||
cap waktu | Tensor | int64 |
- Contoh ( tfds.as_dataframe ):
rlu_locomotion/humanoid_gaps
Ukuran dataset :
4.57 GiB
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 8.000 |
- Struktur fitur :
FeaturesDict({
'episode_id': int64,
'steps': Dataset({
'action': Tensor(shape=(56,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'walker': FeaturesDict({
'body_height': Tensor(shape=(1,), dtype=float32),
'egocentric_camera': Image(shape=(64, 64, 3), dtype=uint8),
'end_effectors_pos': Tensor(shape=(12,), dtype=float32),
'joints_pos': Tensor(shape=(56,), dtype=float32),
'joints_vel': Tensor(shape=(56,), dtype=float32),
'sensors_accelerometer': Tensor(shape=(3,), dtype=float32),
'sensors_gyro': Tensor(shape=(3,), dtype=float32),
'sensors_velocimeter': Tensor(shape=(3,), dtype=float32),
'world_zaxis': Tensor(shape=(3,), dtype=float32),
}),
}),
'reward': float32,
}),
'timestamp': int64,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
episode_id | Tensor | int64 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (56,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | fiturDict | |||
langkah/pengamatan/pejalan | fiturDict | |||
langkah/observasi/pejalan/tinggi badan | Tensor | (1,) | float32 | |
langkah/pengamatan/pejalan kaki/kamera_egosentris | Gambar | (64, 64, 3) | uint8 | |
langkah/pengamatan/pejalan/end_effectors_pos | Tensor | (12,) | float32 | |
langkah/pengamatan/pejalan kaki/joints_pos | Tensor | (56,) | float32 | |
langkah/pengamatan/pejalan kaki/sendi_vel | Tensor | (56,) | float32 | |
langkah/pengamatan/pejalan/sensors_accelerometer | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/sensor_gyro | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_velocimeter | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/world_zaxis | Tensor | (3,) | float32 | |
langkah/hadiah | Tensor | float32 | ||
cap waktu | Tensor | int64 |
- Contoh ( tfds.as_dataframe ):
rlu_locomotion/humanoid_walls
Ukuran dataset :
2.36 GiB
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 4.000 |
- Struktur fitur :
FeaturesDict({
'episode_id': int64,
'steps': Dataset({
'action': Tensor(shape=(56,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'walker': FeaturesDict({
'body_height': Tensor(shape=(1,), dtype=float32),
'egocentric_camera': Image(shape=(64, 64, 3), dtype=uint8),
'end_effectors_pos': Tensor(shape=(12,), dtype=float32),
'joints_pos': Tensor(shape=(56,), dtype=float32),
'joints_vel': Tensor(shape=(56,), dtype=float32),
'sensors_accelerometer': Tensor(shape=(3,), dtype=float32),
'sensors_gyro': Tensor(shape=(3,), dtype=float32),
'sensors_velocimeter': Tensor(shape=(3,), dtype=float32),
'world_zaxis': Tensor(shape=(3,), dtype=float32),
}),
}),
'reward': float32,
}),
'timestamp': int64,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
episode_id | Tensor | int64 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (56,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | fiturDict | |||
langkah/pengamatan/pejalan | fiturDict | |||
langkah/observasi/pejalan/tinggi badan | Tensor | (1,) | float32 | |
langkah/pengamatan/pejalan kaki/kamera_egosentris | Gambar | (64, 64, 3) | uint8 | |
langkah/pengamatan/pejalan/end_effectors_pos | Tensor | (12,) | float32 | |
langkah/pengamatan/pejalan kaki/joints_pos | Tensor | (56,) | float32 | |
langkah/pengamatan/pejalan kaki/sendi_vel | Tensor | (56,) | float32 | |
langkah/pengamatan/pejalan/sensors_accelerometer | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/sensor_gyro | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_velocimeter | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/world_zaxis | Tensor | (3,) | float32 | |
langkah/hadiah | Tensor | float32 | ||
cap waktu | Tensor | int64 |
- Contoh ( tfds.as_dataframe ):
rlu_locomotion/rodent_bowl_escape
Ukuran dataset :
16.46 GiB
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.000 |
- Struktur fitur :
FeaturesDict({
'episode_id': int64,
'steps': Dataset({
'action': Tensor(shape=(38,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'walker': FeaturesDict({
'appendages_pos': Tensor(shape=(15,), dtype=float32),
'egocentric_camera': Image(shape=(64, 64, 3), dtype=uint8),
'joints_pos': Tensor(shape=(30,), dtype=float32),
'joints_vel': Tensor(shape=(30,), dtype=float32),
'sensors_accelerometer': Tensor(shape=(3,), dtype=float32),
'sensors_gyro': Tensor(shape=(3,), dtype=float32),
'sensors_touch': Tensor(shape=(4,), dtype=float32),
'sensors_velocimeter': Tensor(shape=(3,), dtype=float32),
'tendons_pos': Tensor(shape=(8,), dtype=float32),
'tendons_vel': Tensor(shape=(8,), dtype=float32),
'world_zaxis': Tensor(shape=(3,), dtype=float32),
}),
}),
'reward': float32,
}),
'timestamp': int64,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
episode_id | Tensor | int64 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (38,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | fiturDict | |||
langkah/pengamatan/pejalan | fiturDict | |||
langkah/pengamatan/pejalan kaki/pelengkapan_pos | Tensor | (15,) | float32 | |
langkah/pengamatan/pejalan kaki/kamera_egosentris | Gambar | (64, 64, 3) | uint8 | |
langkah/pengamatan/pejalan kaki/joints_pos | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan kaki/sendi_vel | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan/sensors_accelerometer | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/sensor_gyro | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_touch | Tensor | (4,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_velocimeter | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/tendon_pos | Tensor | (8,) | float32 | |
langkah/observasi/pejalan kaki/tendon_vel | Tensor | (8,) | float32 | |
langkah/pengamatan/pejalan/world_zaxis | Tensor | (3,) | float32 | |
langkah/hadiah | Tensor | float32 | ||
cap waktu | Tensor | int64 |
- Contoh ( tfds.as_dataframe ):
rlu_locomotion/rodent_gaps
Ukuran dataset :
8.90 GiB
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.000 |
- Struktur fitur :
FeaturesDict({
'episode_id': int64,
'steps': Dataset({
'action': Tensor(shape=(38,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'walker': FeaturesDict({
'appendages_pos': Tensor(shape=(15,), dtype=float32),
'egocentric_camera': Image(shape=(64, 64, 3), dtype=uint8),
'joints_pos': Tensor(shape=(30,), dtype=float32),
'joints_vel': Tensor(shape=(30,), dtype=float32),
'sensors_accelerometer': Tensor(shape=(3,), dtype=float32),
'sensors_gyro': Tensor(shape=(3,), dtype=float32),
'sensors_touch': Tensor(shape=(4,), dtype=float32),
'sensors_velocimeter': Tensor(shape=(3,), dtype=float32),
'tendons_pos': Tensor(shape=(8,), dtype=float32),
'tendons_vel': Tensor(shape=(8,), dtype=float32),
'world_zaxis': Tensor(shape=(3,), dtype=float32),
}),
}),
'reward': float32,
}),
'timestamp': int64,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
episode_id | Tensor | int64 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (38,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | fiturDict | |||
langkah/pengamatan/pejalan | fiturDict | |||
langkah/pengamatan/pejalan kaki/pelengkapan_pos | Tensor | (15,) | float32 | |
langkah/pengamatan/pejalan kaki/kamera_egosentris | Gambar | (64, 64, 3) | uint8 | |
langkah/pengamatan/pejalan kaki/joints_pos | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan kaki/sendi_vel | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan/sensors_accelerometer | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/sensor_gyro | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_touch | Tensor | (4,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_velocimeter | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/tendon_pos | Tensor | (8,) | float32 | |
langkah/observasi/pejalan kaki/tendon_vel | Tensor | (8,) | float32 | |
langkah/pengamatan/pejalan/world_zaxis | Tensor | (3,) | float32 | |
langkah/hadiah | Tensor | float32 | ||
cap waktu | Tensor | int64 |
- Contoh ( tfds.as_dataframe ):
rlu_locomotion/rodent_mazes
Ukuran dataset :
20.71 GiB
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.000 |
- Struktur fitur :
FeaturesDict({
'episode_id': int64,
'steps': Dataset({
'action': Tensor(shape=(38,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'walker': FeaturesDict({
'appendages_pos': Tensor(shape=(15,), dtype=float32),
'egocentric_camera': Image(shape=(64, 64, 3), dtype=uint8),
'joints_pos': Tensor(shape=(30,), dtype=float32),
'joints_vel': Tensor(shape=(30,), dtype=float32),
'sensors_accelerometer': Tensor(shape=(3,), dtype=float32),
'sensors_gyro': Tensor(shape=(3,), dtype=float32),
'sensors_touch': Tensor(shape=(4,), dtype=float32),
'sensors_velocimeter': Tensor(shape=(3,), dtype=float32),
'tendons_pos': Tensor(shape=(8,), dtype=float32),
'tendons_vel': Tensor(shape=(8,), dtype=float32),
'world_zaxis': Tensor(shape=(3,), dtype=float32),
}),
}),
'reward': float32,
}),
'timestamp': int64,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
episode_id | Tensor | int64 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (38,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | fiturDict | |||
langkah/pengamatan/pejalan | fiturDict | |||
langkah/pengamatan/pejalan kaki/pelengkapan_pos | Tensor | (15,) | float32 | |
langkah/pengamatan/pejalan kaki/kamera_egosentris | Gambar | (64, 64, 3) | uint8 | |
langkah/pengamatan/pejalan kaki/joints_pos | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan kaki/sendi_vel | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan/sensors_accelerometer | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/sensor_gyro | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_touch | Tensor | (4,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_velocimeter | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/tendon_pos | Tensor | (8,) | float32 | |
langkah/observasi/pejalan kaki/tendon_vel | Tensor | (8,) | float32 | |
langkah/pengamatan/pejalan/world_zaxis | Tensor | (3,) | float32 | |
langkah/hadiah | Tensor | float32 | ||
cap waktu | Tensor | int64 |
- Contoh ( tfds.as_dataframe ):
rlu_locomotion/rodent_two_touch
Ukuran dataset :
23.05 GiB
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.000 |
- Struktur fitur :
FeaturesDict({
'episode_id': int64,
'steps': Dataset({
'action': Tensor(shape=(38,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': FeaturesDict({
'walker': FeaturesDict({
'appendages_pos': Tensor(shape=(15,), dtype=float32),
'egocentric_camera': Image(shape=(64, 64, 3), dtype=uint8),
'joints_pos': Tensor(shape=(30,), dtype=float32),
'joints_vel': Tensor(shape=(30,), dtype=float32),
'sensors_accelerometer': Tensor(shape=(3,), dtype=float32),
'sensors_gyro': Tensor(shape=(3,), dtype=float32),
'sensors_touch': Tensor(shape=(4,), dtype=float32),
'sensors_velocimeter': Tensor(shape=(3,), dtype=float32),
'tendons_pos': Tensor(shape=(8,), dtype=float32),
'tendons_vel': Tensor(shape=(8,), dtype=float32),
'world_zaxis': Tensor(shape=(3,), dtype=float32),
}),
}),
'reward': float32,
}),
'timestamp': int64,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
episode_id | Tensor | int64 | ||
Langkah | Himpunan data | |||
langkah/tindakan | Tensor | (38,) | float32 | |
langkah/diskon | Tensor | float32 | ||
langkah/adalah_pertama | Tensor | bool | ||
langkah/is_last | Tensor | bool | ||
langkah/is_terminal | Tensor | bool | ||
langkah/pengamatan | fiturDict | |||
langkah/pengamatan/pejalan | fiturDict | |||
langkah/pengamatan/pejalan kaki/pelengkapan_pos | Tensor | (15,) | float32 | |
langkah/pengamatan/pejalan kaki/kamera_egosentris | Gambar | (64, 64, 3) | uint8 | |
langkah/pengamatan/pejalan kaki/joints_pos | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan kaki/sendi_vel | Tensor | (30,) | float32 | |
langkah/pengamatan/pejalan/sensors_accelerometer | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan/sensor_gyro | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_touch | Tensor | (4,) | float32 | |
langkah/pengamatan/pejalan kaki/sensors_velocimeter | Tensor | (3,) | float32 | |
langkah/pengamatan/pejalan kaki/tendon_pos | Tensor | (8,) | float32 | |
langkah/observasi/pejalan kaki/tendon_vel | Tensor | (8,) | float32 | |
langkah/pengamatan/pejalan/world_zaxis | Tensor | (3,) | float32 | |
langkah/hadiah | Tensor | float32 | ||
cap waktu | Tensor | int64 |
- Contoh ( tfds.as_dataframe ):