TFDS ora supporta il formato Croissant 🥐 ! Leggi la documentazione per saperne di più.

Questa pagina è stata tradotta dall'API Cloud Translation.

rlu_control_suite

Descrizione :

RL Unplugged è una suite di benchmark per l'apprendimento per rinforzo offline. RL Unplugged è progettato sulla base delle seguenti considerazioni: per facilitare la facilità d'uso, forniamo ai set di dati un'API unificata che rende facile per il professionista lavorare con tutti i dati nella suite una volta stabilita una pipeline generale.

I set di dati seguono il formato RLDS per rappresentare passaggi ed episodi.

DeepMind Control Suite Tassa et al., 2018 è un insieme di attività di controllo implementate in MuJoCo Todorov et al., 2012 . Consideriamo un sottoinsieme dei compiti forniti nella suite che copre una vasta gamma di difficoltà.

La maggior parte dei set di dati in questo dominio viene generata utilizzando D4PG. Per gli ambienti Manipulator insert ball e Manipulator insert peg utilizziamo V-MPO Song et al., 2020 per generare i dati poiché D4PG non è in grado di risolvere questi compiti. Rilasciamo set di dati per 9 attività della suite di controllo. Per i dettagli su come è stato generato il set di dati, fare riferimento al documento.

DeepMind Control Suite è un tradizionale benchmark RL ad azione continua. In particolare, ti consigliamo di testare il tuo approccio in DeepMind Control Suite se sei interessato a confrontarti con altri metodi RL offline all'avanguardia.

Pagina iniziale : https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
Codice sorgente : tfds.rl_unplugged.rlu_control_suite.RluControlSuite
Versioni :
- 1.0.0 (impostazione predefinita): versione iniziale.
Dimensioni del download : Unknown size
Chiavi supervisionate (Vedi as_supervised doc ): None
Figura ( tfds.show_examples ): non supportato.
Citazione :

@inproceedings{gulcehre2020rl,
 title = {RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning},
 author = {Gulcehre, Caglar and Wang, Ziyu and Novikov, Alexander and Paine, Thomas and G'{o}mez, Sergio and Zolna, Konrad and Agarwal, Rishabh and Merel, Josh S and Mankowitz, Daniel J and Paduraru, Cosmin and Dulac-Arnold, Gabriel and Li, Jerry and Norouzi, Mohammad and Hoffman, Matthew and Heess, Nicolas and de Freitas, Nando},
 booktitle = {Advances in Neural Information Processing Systems},
 pages = {7248--7259},
 volume = {33},
 year = {2020}
}

rlu_control_suite/cartpole_swingup (configurazione predefinita)

Dimensione del set di dati : 2.12 MiB
Auto-cache ( documentazione ): Sì
Divisioni :

Diviso	Esempi
`'train'`	40

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(1,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'position': Tensor(shape=(3,), dtype=float32),
            'velocity': Tensor(shape=(2,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(1,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/posizione	Tensore	(3,)	galleggiante32
passi/osservazione/velocità	Tensore	(2,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/cheetah_run

Dimensione del set di dati: 36.58 MiB
Auto-cache ( documentazione ): Sì
Divisioni :

Diviso	Esempi
`'train'`	300

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'position': Tensor(shape=(8,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(6,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/posizione	Tensore	(8,)	galleggiante32
passi/osservazione/velocità	Tensore	(9,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/finger_turn_hard

Dimensione del set di dati: 47.61 MiB
Auto-cache ( documentazione ): Sì
Divisioni :

Diviso	Esempi
`'train'`	500

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(2,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'dist_to_target': Tensor(shape=(1,), dtype=float32),
            'position': Tensor(shape=(4,), dtype=float32),
            'target_position': Tensor(shape=(2,), dtype=float32),
            'velocity': Tensor(shape=(3,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(2,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/dist_to_target	Tensore	(1,)	galleggiante32
passi/osservazione/posizione	Tensore	(4,)	galleggiante32
passi/osservazione/target_position	Tensore	(2,)	galleggiante32
passi/osservazione/velocità	Tensore	(3,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/fish_swim

Dimensione del set di dati: 32.81 MiB
Auto-cache ( documentazione ): Sì
Divisioni :

Diviso	Esempi
`'train'`	200

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'joint_angles': Tensor(shape=(7,), dtype=float32),
            'target': Tensor(shape=(3,), dtype=float32),
            'upright': Tensor(shape=(1,), dtype=float32),
            'velocity': Tensor(shape=(13,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(5,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/joint_angles	Tensore	(7,)	galleggiante32
passi/osservazione/target	Tensore	(3,)	galleggiante32
passi/osservazione/eretto	Tensore	(1,)	galleggiante32
passi/osservazione/velocità	Tensore	(13,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/humanoid_run

Dimensione del set di dati : 1.21 GiB
Cache automatica ( documentazione ): No
Divisioni :

Diviso	Esempi
`'train'`	3.000

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(21,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'com_velocity': Tensor(shape=(3,), dtype=float32),
            'extremities': Tensor(shape=(12,), dtype=float32),
            'head_height': Tensor(shape=(1,), dtype=float32),
            'joint_angles': Tensor(shape=(21,), dtype=float32),
            'torso_vertical': Tensor(shape=(3,), dtype=float32),
            'velocity': Tensor(shape=(27,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(21,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/com_velocity	Tensore	(3,)	galleggiante32
passi/osservazione/estremità	Tensore	(12,)	galleggiante32
passi/osservazione/altezza_testa	Tensore	(1,)	galleggiante32
passi/osservazione/joint_angles	Tensore	(21,)	galleggiante32
passi/osservazione/torso_vertical	Tensore	(3,)	galleggiante32
passi/osservazione/velocità	Tensore	(27,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/manipulator_insert_ball

Dimensione del set di dati: 385.41 MiB
Cache automatica ( documentazione ): No
Divisioni :

Diviso	Esempi
`'train'`	1.500

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'arm_pos': Tensor(shape=(16,), dtype=float32),
            'arm_vel': Tensor(shape=(8,), dtype=float32),
            'hand_pos': Tensor(shape=(4,), dtype=float32),
            'object_pos': Tensor(shape=(4,), dtype=float32),
            'object_vel': Tensor(shape=(3,), dtype=float32),
            'target_pos': Tensor(shape=(4,), dtype=float32),
            'touch': Tensor(shape=(5,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(5,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/arm_pos	Tensore	(16,)	galleggiante32
passi/osservazione/arm_vel	Tensore	(8,)	galleggiante32
passi/osservazione/mano_pos	Tensore	(4,)	galleggiante32
passi/osservazione/oggetto_pos	Tensore	(4,)	galleggiante32
passi/osservazione/object_vel	Tensore	(3,)	galleggiante32
passi/osservazione/target_pos	Tensore	(4,)	galleggiante32
passi/osservazione/tocco	Tensore	(5,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/manipulator_insert_peg

Dimensione del set di dati: 385.73 MiB
Cache automatica ( documentazione ): No
Divisioni :

Diviso	Esempi
`'train'`	1.500

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'arm_pos': Tensor(shape=(16,), dtype=float32),
            'arm_vel': Tensor(shape=(8,), dtype=float32),
            'hand_pos': Tensor(shape=(4,), dtype=float32),
            'object_pos': Tensor(shape=(4,), dtype=float32),
            'object_vel': Tensor(shape=(3,), dtype=float32),
            'target_pos': Tensor(shape=(4,), dtype=float32),
            'touch': Tensor(shape=(5,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(5,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/arm_pos	Tensore	(16,)	galleggiante32
passi/osservazione/arm_vel	Tensore	(8,)	galleggiante32
passi/osservazione/mano_pos	Tensore	(4,)	galleggiante32
passi/osservazione/oggetto_pos	Tensore	(4,)	galleggiante32
passi/osservazione/object_vel	Tensore	(3,)	galleggiante32
passi/osservazione/target_pos	Tensore	(4,)	galleggiante32
passi/osservazione/tocco	Tensore	(5,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/walker_stand

Dimensione del set di dati: 31.78 MiB
Auto-cache ( documentazione ): Sì
Divisioni :

Diviso	Esempi
`'train'`	200

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'height': Tensor(shape=(1,), dtype=float32),
            'orientations': Tensor(shape=(14,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(6,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/altezza	Tensore	(1,)	galleggiante32
passi/osservazione/orientamenti	Tensore	(14,)	galleggiante32
passi/osservazione/velocità	Tensore	(9,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):

rlu_control_suite/walker_walk

Dimensione del set di dati: 31.78 MiB
Auto-cache ( documentazione ): Sì
Divisioni :

Diviso	Esempi
`'train'`	200

Struttura delle caratteristiche :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'height': Tensor(shape=(1,), dtype=float32),
            'orientations': Tensor(shape=(14,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
episodio_id	Tensore		int64
passi	Set di dati
passi/azione	Tensore	(6,)	galleggiante32
passi/sconto	Tensore		galleggiante32
passi/è_primo	Tensore		bool
passi/è_ultimo	Tensore		bool
passi/è_terminale	Tensore		bool
passi/osservazione	CaratteristicheDict
passi/osservazione/altezza	Tensore	(1,)	galleggiante32
passi/osservazione/orientamenti	Tensore	(14,)	galleggiante32
passi/osservazione/velocità	Tensore	(9,)	galleggiante32
passi/ricompensa	Tensore		galleggiante32
timestamp	Tensore		int64

Esempi ( tfds.as_dataframe ):