TFDS hiện hỗ trợ định dạng Croissant 🥐 ! Đọc tài liệu để biết thêm.

Trang này được dịch bởi Cloud Translation API.

d4rl_mujoco_ant,d4rl_mujoco_ant

Sự miêu tả :

D4RL là một chuẩn mực nguồn mở dành cho việc học tăng cường ngoại tuyến. Nó cung cấp các môi trường và bộ dữ liệu được tiêu chuẩn hóa cho các thuật toán đào tạo và đo điểm chuẩn.

Các bộ dữ liệu tuân theo định dạng RLDS để thể hiện các bước và các tập.

Mô tả cấu hình : Xem thêm chi tiết về nhiệm vụ và các phiên bản của nó trong https://github.com/rail-berkeley/d4rl/wiki/Tasks#gym
Trang chủ : https://sites.google.com/view/d4rl-anonymous
Mã nguồn : tfds.d4rl.d4rl_mujoco_ant.D4rlMujocoAnt
Phiên bản :
- 1.0.0 : Bản phát hành đầu tiên.
- 1.1.0 : Đã thêm is_last.
- 1.2.0 (mặc định): Đã cập nhật để tính đến lần quan sát tiếp theo.
Khóa được giám sát (Xem as_supervised doc ): None
Hình ( tfds.show_examples ): Không được hỗ trợ.
Trích dẫn :

@misc{fu2020d4rl,
    title={D4RL: Datasets for Deep Data-Driven Reinforcement Learning},
    author={Justin Fu and Aviral Kumar and Ofir Nachum and George Tucker and Sergey Levine},
    year={2020},
    eprint={2004.07219},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

d4rl_mujoco_ant/v0-expert (cấu hình mặc định)

Kích thước tải xuống : 131.34 MiB
Kích thước tập dữ liệu : 464.94 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.288

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v0-medium

Kích thước tải xuống : 131.39 MiB
Kích thước tập dữ liệu : 464.78 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.122

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v0-medium-expert

Kích thước tải xuống : 262.73 MiB
Kích thước tập dữ liệu : 929.71 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	2.410

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v0-mixed

Kích thước tải xuống : 104.63 MiB
Kích thước tập dữ liệu : 464.93 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.320

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v0-ngẫu nhiên

Kích thước tải xuống : 139.50 MiB
Kích thước tập dữ liệu : 464.97 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.377

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v1-expert

Kích thước tải xuống : 220.72 MiB
Kích thước tập dữ liệu : 968.63 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.033

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'policy': FeaturesDict({
        'fc0': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 111), dtype=float32),
        }),
        'fc1': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 256), dtype=float32),
        }),
        'last_fc': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'last_fc_log_std': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'nonlinearity': string,
        'output_distribution': string,
    }),
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float32,
            'qpos': Tensor(shape=(15,), dtype=float32),
            'qvel': Tensor(shape=(14,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
chính sách	Tính năngDict
chính sách/fc0	Tính năngDict
chính sách/fc0/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc0/trọng lượng	Tenxơ	(256, 111)	phao32
chính sách/fc1	Tính năngDict
chính sách/fc1/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc1/trọng lượng	Tenxơ	(256, 256)	phao32
chính sách/last_fc	Tính năngDict
chính sách/last_fc/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/last_fc_log_std	Tính năngDict
chính sách/last_fc_log_std/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc_log_std/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/phi tuyến tính	Tenxơ		sợi dây
chính sách/output_distribution	Tenxơ		sợi dây
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao32
bước/thông tin/qpos	Tenxơ	(15,)	phao32
bước/thông tin/qvel	Tenxơ	(14,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v1-medium

Kích thước tải xuống : 222.39 MiB
Kích thước tập dữ liệu : 1023.71 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.179

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'policy': FeaturesDict({
        'fc0': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 111), dtype=float32),
        }),
        'fc1': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 256), dtype=float32),
        }),
        'last_fc': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'last_fc_log_std': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'nonlinearity': string,
        'output_distribution': string,
    }),
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float32,
            'qpos': Tensor(shape=(15,), dtype=float32),
            'qvel': Tensor(shape=(14,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
chính sách	Tính năngDict
chính sách/fc0	Tính năngDict
chính sách/fc0/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc0/trọng lượng	Tenxơ	(256, 111)	phao32
chính sách/fc1	Tính năngDict
chính sách/fc1/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc1/trọng lượng	Tenxơ	(256, 256)	phao32
chính sách/last_fc	Tính năngDict
chính sách/last_fc/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/last_fc_log_std	Tính năngDict
chính sách/last_fc_log_std/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc_log_std/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/phi tuyến tính	Tenxơ		sợi dây
chính sách/output_distribution	Tenxơ		sợi dây
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao32
bước/thông tin/qpos	Tenxơ	(15,)	phao32
bước/thông tin/qvel	Tenxơ	(14,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v1-medium-expert

Kích thước tải xuống : 442.25 MiB
Kích thước tập dữ liệu : 1.13 GiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	2.211

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float32,
            'qpos': Tensor(shape=(15,), dtype=float32),
            'qvel': Tensor(shape=(14,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao32
bước/thông tin/qpos	Tenxơ	(15,)	phao32
bước/thông tin/qvel	Tenxơ	(14,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v1-medium-replay

Kích thước tải xuống : 132.05 MiB
Kích thước tập dữ liệu : 175.27 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Chỉ khi shuffle_files=False (train)
Chia tách :

Tách ra	Ví dụ
`'train'`	485

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float64),
        'discount': float64,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float64),
        'reward': float64,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao64
bước/giảm giá	Tenxơ		phao64
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao64
bước/phần thưởng	Tenxơ		phao64

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v1-full-replay

Kích thước tải xuống : 437.57 MiB
Kích thước tập dữ liệu : 580.09 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.319

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float64),
        'discount': float64,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float64),
        'reward': float64,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao64
bước/giảm giá	Tenxơ		phao64
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao64
bước/phần thưởng	Tenxơ		phao64

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v1-ngẫu nhiên

Kích thước tải xuống : 225.18 MiB
Kích thước tập dữ liệu : 583.83 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	5,741

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float32,
            'qpos': Tensor(shape=(15,), dtype=float32),
            'qvel': Tensor(shape=(14,), dtype=float32),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao32
bước/thông tin/qpos	Tenxơ	(15,)	phao32
bước/thông tin/qvel	Tenxơ	(14,)	phao32
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v2-expert

Kích thước tải xuống : 355.94 MiB
Kích thước tập dữ liệu : 969.38 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.035

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'policy': FeaturesDict({
        'fc0': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 111), dtype=float32),
        }),
        'fc1': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 256), dtype=float32),
        }),
        'last_fc': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'last_fc_log_std': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'nonlinearity': string,
        'output_distribution': string,
    }),
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
chính sách	Tính năngDict
chính sách/fc0	Tính năngDict
chính sách/fc0/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc0/trọng lượng	Tenxơ	(256, 111)	phao32
chính sách/fc1	Tính năngDict
chính sách/fc1/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc1/trọng lượng	Tenxơ	(256, 256)	phao32
chính sách/last_fc	Tính năngDict
chính sách/last_fc/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/last_fc_log_std	Tính năngDict
chính sách/last_fc_log_std/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc_log_std/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/phi tuyến tính	Tenxơ		sợi dây
chính sách/output_distribution	Tenxơ		sợi dây
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v2-full-replay

Kích thước tải xuống : 428.57 MiB
Kích thước tập dữ liệu : 580.09 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.319

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v2-medium

Kích thước tải xuống : 358.81 MiB
Kích thước tập dữ liệu : 1.01 GiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	1.203

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'policy': FeaturesDict({
        'fc0': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 111), dtype=float32),
        }),
        'fc1': FeaturesDict({
            'bias': Tensor(shape=(256,), dtype=float32),
            'weight': Tensor(shape=(256, 256), dtype=float32),
        }),
        'last_fc': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'last_fc_log_std': FeaturesDict({
            'bias': Tensor(shape=(8,), dtype=float32),
            'weight': Tensor(shape=(8, 256), dtype=float32),
        }),
        'nonlinearity': string,
        'output_distribution': string,
    }),
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
chính sách	Tính năngDict
chính sách/fc0	Tính năngDict
chính sách/fc0/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc0/trọng lượng	Tenxơ	(256, 111)	phao32
chính sách/fc1	Tính năngDict
chính sách/fc1/thiên vị	Tenxơ	(256,)	phao32
chính sách/fc1/trọng lượng	Tenxơ	(256, 256)	phao32
chính sách/last_fc	Tính năngDict
chính sách/last_fc/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/last_fc_log_std	Tính năngDict
chính sách/last_fc_log_std/thiên vị	Tenxơ	(số 8,)	phao32
chính sách/last_fc_log_std/trọng lượng	Tenxơ	(8, 256)	phao32
chính sách/phi tuyến tính	Tenxơ		sợi dây
chính sách/output_distribution	Tenxơ		sợi dây
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v2-medium-expert

Kích thước tải xuống : 713.67 MiB
Kích thước tập dữ liệu : 1.13 GiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	2.237

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v2-medium-replay

Kích thước tải xuống : 130.16 MiB
Kích thước tập dữ liệu : 175.27 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Chỉ khi shuffle_files=False (train)
Chia tách :

Tách ra	Ví dụ
`'train'`	485

Cấu trúc tính năng :

FeaturesDict({
    'algorithm': string,
    'iteration': int32,
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
thuật toán	Tenxơ		sợi dây
sự lặp lại	Tenxơ		int32
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):

d4rl_mujoco_ant/v2-ngẫu nhiên

Kích thước tải xuống : 366.66 MiB
Kích thước tập dữ liệu : 583.90 MiB
Tự động lưu vào bộ nhớ đệm ( tài liệu ): Không
Chia tách :

Tách ra	Ví dụ
`'train'`	5,822

Cấu trúc tính năng :

FeaturesDict({
    'steps': Dataset({
        'action': Tensor(shape=(8,), dtype=float32),
        'discount': float32,
        'infos': FeaturesDict({
            'action_log_probs': float64,
            'qpos': Tensor(shape=(15,), dtype=float64),
            'qvel': Tensor(shape=(14,), dtype=float64),
        }),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': Tensor(shape=(111,), dtype=float32),
        'reward': float32,
    }),
})

Tài liệu tính năng :

Tính năng	Lớp học	Hình dạng	loại D
	Tính năngDict
bước	Tập dữ liệu
bước/hành động	Tenxơ	(số 8,)	phao32
bước/giảm giá	Tenxơ		phao32
các bước/thông tin	Tính năngDict
bước/thông tin/action_log_probs	Tenxơ		phao64
bước/thông tin/qpos	Tenxơ	(15,)	phao64
bước/thông tin/qvel	Tenxơ	(14,)	phao64
bước/is_first	Tenxơ		bool
bước/is_last	Tenxơ		bool
bước/is_terminal	Tenxơ		bool
bước/quan sát	Tenxơ	(111,)	phao32
bước/phần thưởng	Tenxơ		phao32

Ví dụ ( tfds.as_dataframe ):