TFDS hiện hỗ trợ định dạng Croissant 🥐 ! Đọc tài liệu để biết thêm.

Trang này được dịch bởi Cloud Translation API.

rlu_control_suite

Mô tả :

RL Unplugged là bộ tiêu chuẩn cho việc học tăng cường ngoại tuyến. RL Unplugged được thiết kế dựa trên những cân nhắc sau: để tạo điều kiện thuận lợi cho việc sử dụng, chúng tôi cung cấp các bộ dữ liệu với một API thống nhất giúp người thực hành dễ dàng làm việc với tất cả dữ liệu trong bộ sau khi một quy trình chung đã được thiết lập.

Các bộ dữ liệu tuân theo định dạng RLDS để thể hiện các bước và giai đoạn.

DeepMind Control Suite Tassa et al., 2018 là một tập hợp các tác vụ kiểm soát được triển khai trong MuJoCo Todorov et al., 2012 . Chúng tôi xem xét một tập hợp con các nhiệm vụ được cung cấp trong bộ phần mềm bao gồm nhiều khó khăn khác nhau.

Hầu hết các bộ dữ liệu trong miền này được tạo bằng D4PG. Đối với môi trường Bi chèn của Người thao tác và chốt chèn của Người thao tác, chúng tôi sử dụng V-MPO Song et al., 2020 để tạo dữ liệu vì D4PG không thể giải quyết các tác vụ này. Chúng tôi phát hành bộ dữ liệu cho 9 nhiệm vụ của bộ điều khiển. Để biết chi tiết về cách tạo bộ dữ liệu, vui lòng tham khảo bài viết.

DeepMind Control Suite là một tiêu chuẩn RL hành động liên tục truyền thống. Đặc biệt, chúng tôi khuyên bạn nên thử nghiệm phương pháp của mình trong DeepMind Control Suite nếu bạn quan tâm đến việc so sánh với các phương pháp RL ngoại tuyến hiện đại khác.

Trang chủ : https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
Mã nguồn : tfds.rl_unplugged.rlu_control_suite.RluControlSuite
Phiên bản :
- 1.0.0 (mặc định): Bản phát hành đầu tiên.
Kích thước tải xuống : Unknown size
Các khóa được giám sát (Xem as_supervised doc ): None
Hình ( tfds.show_examples ): Không được hỗ trợ.
trích dẫn :

@inproceedings{gulcehre2020rl,
 title = {RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning},
 author = {Gulcehre, Caglar and Wang, Ziyu and Novikov, Alexander and Paine, Thomas and G'{o}mez, Sergio and Zolna, Konrad and Agarwal, Rishabh and Merel, Josh S and Mankowitz, Daniel J and Paduraru, Cosmin and Dulac-Arnold, Gabriel and Li, Jerry and Norouzi, Mohammad and Hoffman, Matthew and Heess, Nicolas and de Freitas, Nando},
 booktitle = {Advances in Neural Information Processing Systems},
 pages = {7248--7259},
 volume = {33},
 year = {2020}
}

rlu_control_suite/cartpole_swingup (cấu hình mặc định)

Kích thước tập dữ liệu : 2.12 MiB
Tự động lưu vào bộ đệm ( tài liệu ): Có
Chia tách :

Tách ra	ví dụ
`'train'`	40

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(1,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'position': Tensor(shape=(3,), dtype=float32),
            'velocity': Tensor(shape=(2,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(1,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
bước/quan sát/vị trí	tenxơ	(3,)	phao32
bước/quan sát/vận tốc	tenxơ	(2,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/cheetah_run

Kích thước tập dữ liệu : 36.58 MiB
Tự động lưu vào bộ đệm ( tài liệu ): Có
Chia tách :

Tách ra	ví dụ
`'train'`	300

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'position': Tensor(shape=(8,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(6,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
bước/quan sát/vị trí	tenxơ	(số 8,)	phao32
bước/quan sát/vận tốc	tenxơ	(9,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/finger_turn_hard

Kích thước tập dữ liệu : 47.61 MiB
Tự động lưu vào bộ đệm ( tài liệu ): Có
Chia tách :

Tách ra	ví dụ
`'train'`	500

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(2,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'dist_to_target': Tensor(shape=(1,), dtype=float32),
            'position': Tensor(shape=(4,), dtype=float32),
            'target_position': Tensor(shape=(2,), dtype=float32),
            'velocity': Tensor(shape=(3,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(2,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
các bước/quan sát/dist_to_target	tenxơ	(1,)	phao32
bước/quan sát/vị trí	tenxơ	(4,)	phao32
các bước/quan sát/vị trí_mục tiêu	tenxơ	(2,)	phao32
bước/quan sát/vận tốc	tenxơ	(3,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/fish_swim

Kích thước tập dữ liệu : 32.81 MiB
Tự động lưu vào bộ đệm ( tài liệu ): Có
Chia tách :

Tách ra	ví dụ
`'train'`	200

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'joint_angles': Tensor(shape=(7,), dtype=float32),
            'target': Tensor(shape=(3,), dtype=float32),
            'upright': Tensor(shape=(1,), dtype=float32),
            'velocity': Tensor(shape=(13,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(5,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
bước/quan sát/joint_angles	tenxơ	(7,)	phao32
các bước/quan sát/mục tiêu	tenxơ	(3,)	phao32
bước/quan sát/thẳng đứng	tenxơ	(1,)	phao32
bước/quan sát/vận tốc	tenxơ	(13,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/humanoid_run

Kích thước tập dữ liệu : 1.21 GiB
Tự động lưu vào bộ nhớ cache ( tài liệu ): Không
Chia tách :

Tách ra	ví dụ
`'train'`	3.000

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(21,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'com_velocity': Tensor(shape=(3,), dtype=float32),
            'extremities': Tensor(shape=(12,), dtype=float32),
            'head_height': Tensor(shape=(1,), dtype=float32),
            'joint_angles': Tensor(shape=(21,), dtype=float32),
            'torso_vertical': Tensor(shape=(3,), dtype=float32),
            'velocity': Tensor(shape=(27,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(21,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
bước/quan sát/com_velocity	tenxơ	(3,)	phao32
bước / quan sát / chi	tenxơ	(12,)	phao32
bước/quan sát/head_height	tenxơ	(1,)	phao32
bước/quan sát/joint_angles	tenxơ	(21,)	phao32
bước/quan sát/torso_vertical	tenxơ	(3,)	phao32
bước/quan sát/vận tốc	tenxơ	(27,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/manipulator_insert_ball

Kích thước tập dữ liệu : 385.41 MiB
Tự động lưu vào bộ nhớ cache ( tài liệu ): Không
Chia tách :

Tách ra	ví dụ
`'train'`	1.500

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'arm_pos': Tensor(shape=(16,), dtype=float32),
            'arm_vel': Tensor(shape=(8,), dtype=float32),
            'hand_pos': Tensor(shape=(4,), dtype=float32),
            'object_pos': Tensor(shape=(4,), dtype=float32),
            'object_vel': Tensor(shape=(3,), dtype=float32),
            'target_pos': Tensor(shape=(4,), dtype=float32),
            'touch': Tensor(shape=(5,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(5,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
các bước/quan sát/arm_pos	tenxơ	(16,)	phao32
các bước/quan sát/arm_vel	tenxơ	(số 8,)	phao32
các bước/quan sát/hand_pos	tenxơ	(4,)	phao32
bước/quan sát/object_pos	tenxơ	(4,)	phao32
bước/quan sát/object_vel	tenxơ	(3,)	phao32
các bước/quan sát/mục tiêu_pos	tenxơ	(4,)	phao32
bước/quan sát/chạm	tenxơ	(5,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/manipulator_insert_peg

Kích thước tập dữ liệu : 385.73 MiB
Tự động lưu vào bộ nhớ cache ( tài liệu ): Không
Chia tách :

Tách ra	ví dụ
`'train'`	1.500

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(5,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'arm_pos': Tensor(shape=(16,), dtype=float32),
            'arm_vel': Tensor(shape=(8,), dtype=float32),
            'hand_pos': Tensor(shape=(4,), dtype=float32),
            'object_pos': Tensor(shape=(4,), dtype=float32),
            'object_vel': Tensor(shape=(3,), dtype=float32),
            'target_pos': Tensor(shape=(4,), dtype=float32),
            'touch': Tensor(shape=(5,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(5,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
các bước/quan sát/arm_pos	tenxơ	(16,)	phao32
các bước/quan sát/arm_vel	tenxơ	(số 8,)	phao32
các bước/quan sát/hand_pos	tenxơ	(4,)	phao32
bước/quan sát/object_pos	tenxơ	(4,)	phao32
bước/quan sát/object_vel	tenxơ	(3,)	phao32
các bước/quan sát/mục tiêu_pos	tenxơ	(4,)	phao32
bước/quan sát/chạm	tenxơ	(5,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/walker_stand

Kích thước tập dữ liệu : 31.78 MiB
Tự động lưu vào bộ đệm ( tài liệu ): Có
Chia tách :

Tách ra	ví dụ
`'train'`	200

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'height': Tensor(shape=(1,), dtype=float32),
            'orientations': Tensor(shape=(14,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(6,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
bước/quan sát/chiều cao	tenxơ	(1,)	phao32
các bước/quan sát/định hướng	tenxơ	(14,)	phao32
bước/quan sát/vận tốc	tenxơ	(9,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite/walker_walk

Kích thước tập dữ liệu : 31.78 MiB
Tự động lưu vào bộ đệm ( tài liệu ): Có
Chia tách :

Tách ra	ví dụ
`'train'`	200

Cấu trúc tính năng :

FeaturesDict({
    'episode_id': int64,
    'steps': Dataset({
        'action': Tensor(shape=(6,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'height': Tensor(shape=(1,), dtype=float32),
            'orientations': Tensor(shape=(14,), dtype=float32),
            'velocity': Tensor(shape=(9,), dtype=float32),
        }),
        'reward': float32,
    }),
    'timestamp': int64,
})

Tài liệu tính năng :

Tính năng	Lớp	Hình dạng	Dtype
	Tính năngDict
tập_id	tenxơ		int64
bước	tập dữ liệu
bước/hành động	tenxơ	(6,)	phao32
bước/giảm giá	tenxơ		phao32
các bước/is_first	tenxơ		bool
bước/is_last	tenxơ		bool
bước/is_terminal	tenxơ		bool
các bước/quan sát	Tính năngDict
bước/quan sát/chiều cao	tenxơ	(1,)	phao32
các bước/quan sát/định hướng	tenxơ	(14,)	phao32
bước/quan sát/vận tốc	tenxơ	(9,)	phao32
bước/phần thưởng	tenxơ		phao32
dấu thời gian	tenxơ		int64

Ví dụ ( tfds.as_dataframe ):

rlu_control_suite Sử dụng bộ sưu tập để sắp xếp ngăn nắp các trang Lưu và phân loại nội dung dựa trên lựa chọn ưu tiên của bạn.