stanford_robocook_converted_externally_to_rlds
Stay organized with collections
Save and categorize content based on your preferences.
Franka preparing dumplings with various tools
Split |
Examples |
'train' |
2,460 |
FeaturesDict({
'episode_metadata': FeaturesDict({
'extrinsics_1': Tensor(shape=(4, 4), dtype=float32, description=Camera 1 Extrinsic Matrix.),
'extrinsics_2': Tensor(shape=(4, 4), dtype=float32, description=Camera 2 Extrinsic Matrix.),
'extrinsics_3': Tensor(shape=(4, 4), dtype=float32, description=Camera 3 Extrinsic Matrix.),
'extrinsics_4': Tensor(shape=(4, 4), dtype=float32, description=Camera 4 Extrinsic Matrix.),
'file_path': Text(shape=(), dtype=string),
}),
'steps': Dataset({
'action': Tensor(shape=(7,), dtype=float32, description=Robot action, consists of [3x robot end-effector velocities, 3x robot end-effector angular velocities, 1x gripper velocity].),
'discount': Scalar(shape=(), dtype=float32, description=Discount if provided, default to 1.),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'language_embedding': Tensor(shape=(512,), dtype=float32, description=Kona language embedding. See https://tfhub.dev/google/universal-sentence-encoder-large/5),
'language_instruction': Text(shape=(), dtype=string),
'observation': FeaturesDict({
'depth_1': Tensor(shape=(256, 256), dtype=float32, description=Camera 1 Depth observation.),
'depth_2': Tensor(shape=(256, 256), dtype=float32, description=Camera 2 Depth observation.),
'depth_3': Tensor(shape=(256, 256), dtype=float32, description=Camera 3 Depth observation.),
'depth_4': Tensor(shape=(256, 256), dtype=float32, description=Camera 4 Depth observation.),
'image_1': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 1 RGB observation.),
'image_2': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 2 RGB observation.),
'image_3': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 3 RGB observation.),
'image_4': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 4 RGB observation.),
'state': Tensor(shape=(7,), dtype=float32, description=Robot state, consists of [3x robot end-effector position, 3x robot end-effector euler angles, 1x gripper position].),
}),
'reward': Scalar(shape=(), dtype=float32, description=Reward if provided, 1 on final step for demos.),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
episode_metadata |
FeaturesDict |
|
|
|
episode_metadata/extrinsics_1 |
Tensor |
(4, 4) |
float32 |
Camera 1 Extrinsic Matrix. |
episode_metadata/extrinsics_2 |
Tensor |
(4, 4) |
float32 |
Camera 2 Extrinsic Matrix. |
episode_metadata/extrinsics_3 |
Tensor |
(4, 4) |
float32 |
Camera 3 Extrinsic Matrix. |
episode_metadata/extrinsics_4 |
Tensor |
(4, 4) |
float32 |
Camera 4 Extrinsic Matrix. |
episode_metadata/file_path |
Text |
|
string |
Path to the original data file. |
steps |
Dataset |
|
|
|
steps/action |
Tensor |
(7,) |
float32 |
Robot action, consists of [3x robot end-effector velocities, 3x robot end-effector angular velocities, 1x gripper velocity]. |
steps/discount |
Scalar |
|
float32 |
Discount if provided, default to 1. |
steps/is_first |
Tensor |
|
bool |
|
steps/is_last |
Tensor |
|
bool |
|
steps/is_terminal |
Tensor |
|
bool |
|
steps/language_embedding |
Tensor |
(512,) |
float32 |
Kona language embedding. See https://tfhub.dev/google/universal-sentence-encoder-large/5 |
steps/language_instruction |
Text |
|
string |
Language Instruction. |
steps/observation |
FeaturesDict |
|
|
|
steps/observation/depth_1 |
Tensor |
(256, 256) |
float32 |
Camera 1 Depth observation. |
steps/observation/depth_2 |
Tensor |
(256, 256) |
float32 |
Camera 2 Depth observation. |
steps/observation/depth_3 |
Tensor |
(256, 256) |
float32 |
Camera 3 Depth observation. |
steps/observation/depth_4 |
Tensor |
(256, 256) |
float32 |
Camera 4 Depth observation. |
steps/observation/image_1 |
Image |
(256, 256, 3) |
uint8 |
Camera 1 RGB observation. |
steps/observation/image_2 |
Image |
(256, 256, 3) |
uint8 |
Camera 2 RGB observation. |
steps/observation/image_3 |
Image |
(256, 256, 3) |
uint8 |
Camera 3 RGB observation. |
steps/observation/image_4 |
Image |
(256, 256, 3) |
uint8 |
Camera 4 RGB observation. |
steps/observation/state |
Tensor |
(7,) |
float32 |
Robot state, consists of [3x robot end-effector position, 3x robot end-effector euler angles, 1x gripper position]. |
steps/reward |
Scalar |
|
float32 |
Reward if provided, 1 on final step for demos. |
@article{shi2023robocook,
title={RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools},
author={Shi, Haochen and Xu, Huazhe and Clarke, Samuel and Li, Yunzhu and Wu, Jiajun},
journal={arXiv preprint arXiv:2306.14447},
year={2023}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-12-11 UTC.
[null,null,["Last updated 2024-12-11 UTC."],[],[],null,["# stanford_robocook_converted_externally_to_rlds\n\n\u003cbr /\u003e\n\n- **Description**:\n\nFranka preparing dumplings with various tools\n\n- **Homepage** :\n \u003chttps://hshi74.github.io/robocook/\u003e\n\n- **Source code** :\n [`tfds.robotics.rtx.StanfordRobocookConvertedExternallyToRlds`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/robotics/rtx/rtx.py)\n\n- **Versions**:\n\n - **`0.1.0`** (default): Initial release.\n- **Download size** : `Unknown size`\n\n- **Dataset size** : `124.59 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 2,460 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'episode_metadata': FeaturesDict({\n 'extrinsics_1': Tensor(shape=(4, 4), dtype=float32, description=Camera 1 Extrinsic Matrix.),\n 'extrinsics_2': Tensor(shape=(4, 4), dtype=float32, description=Camera 2 Extrinsic Matrix.),\n 'extrinsics_3': Tensor(shape=(4, 4), dtype=float32, description=Camera 3 Extrinsic Matrix.),\n 'extrinsics_4': Tensor(shape=(4, 4), dtype=float32, description=Camera 4 Extrinsic Matrix.),\n 'file_path': Text(shape=(), dtype=string),\n }),\n 'steps': Dataset({\n 'action': Tensor(shape=(7,), dtype=float32, description=Robot action, consists of [3x robot end-effector velocities, 3x robot end-effector angular velocities, 1x gripper velocity].),\n 'discount': Scalar(shape=(), dtype=float32, description=Discount if provided, default to 1.),\n 'is_first': bool,\n 'is_last': bool,\n 'is_terminal': bool,\n 'language_embedding': Tensor(shape=(512,), dtype=float32, description=Kona language embedding. See https://tfhub.dev/google/universal-sentence-encoder-large/5),\n 'language_instruction': Text(shape=(), dtype=string),\n 'observation': FeaturesDict({\n 'depth_1': Tensor(shape=(256, 256), dtype=float32, description=Camera 1 Depth observation.),\n 'depth_2': Tensor(shape=(256, 256), dtype=float32, description=Camera 2 Depth observation.),\n 'depth_3': Tensor(shape=(256, 256), dtype=float32, description=Camera 3 Depth observation.),\n 'depth_4': Tensor(shape=(256, 256), dtype=float32, description=Camera 4 Depth observation.),\n 'image_1': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 1 RGB observation.),\n 'image_2': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 2 RGB observation.),\n 'image_3': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 3 RGB observation.),\n 'image_4': Image(shape=(256, 256, 3), dtype=uint8, description=Camera 4 RGB observation.),\n 'state': Tensor(shape=(7,), dtype=float32, description=Robot state, consists of [3x robot end-effector position, 3x robot end-effector euler angles, 1x gripper position].),\n }),\n 'reward': Scalar(shape=(), dtype=float32, description=Reward if provided, 1 on final step for demos.),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------------------------|--------------|---------------|---------|--------------------------------------------------------------------------------------------------------------------------------|\n| | FeaturesDict | | | |\n| episode_metadata | FeaturesDict | | | |\n| episode_metadata/extrinsics_1 | Tensor | (4, 4) | float32 | Camera 1 Extrinsic Matrix. |\n| episode_metadata/extrinsics_2 | Tensor | (4, 4) | float32 | Camera 2 Extrinsic Matrix. |\n| episode_metadata/extrinsics_3 | Tensor | (4, 4) | float32 | Camera 3 Extrinsic Matrix. |\n| episode_metadata/extrinsics_4 | Tensor | (4, 4) | float32 | Camera 4 Extrinsic Matrix. |\n| episode_metadata/file_path | Text | | string | Path to the original data file. |\n| steps | Dataset | | | |\n| steps/action | Tensor | (7,) | float32 | Robot action, consists of \\[3x robot end-effector velocities, 3x robot end-effector angular velocities, 1x gripper velocity\\]. |\n| steps/discount | Scalar | | float32 | Discount if provided, default to 1. |\n| steps/is_first | Tensor | | bool | |\n| steps/is_last | Tensor | | bool | |\n| steps/is_terminal | Tensor | | bool | |\n| steps/language_embedding | Tensor | (512,) | float32 | Kona language embedding. See \u003chttps://tfhub.dev/google/universal-sentence-encoder-large/5\u003e |\n| steps/language_instruction | Text | | string | Language Instruction. |\n| steps/observation | FeaturesDict | | | |\n| steps/observation/depth_1 | Tensor | (256, 256) | float32 | Camera 1 Depth observation. |\n| steps/observation/depth_2 | Tensor | (256, 256) | float32 | Camera 2 Depth observation. |\n| steps/observation/depth_3 | Tensor | (256, 256) | float32 | Camera 3 Depth observation. |\n| steps/observation/depth_4 | Tensor | (256, 256) | float32 | Camera 4 Depth observation. |\n| steps/observation/image_1 | Image | (256, 256, 3) | uint8 | Camera 1 RGB observation. |\n| steps/observation/image_2 | Image | (256, 256, 3) | uint8 | Camera 2 RGB observation. |\n| steps/observation/image_3 | Image | (256, 256, 3) | uint8 | Camera 3 RGB observation. |\n| steps/observation/image_4 | Image | (256, 256, 3) | uint8 | Camera 4 RGB observation. |\n| steps/observation/state | Tensor | (7,) | float32 | Robot state, consists of \\[3x robot end-effector position, 3x robot end-effector euler angles, 1x gripper position\\]. |\n| steps/reward | Scalar | | float32 | Reward if provided, 1 on final step for demos. |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{shi2023robocook,\n title={RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools},\n author={Shi, Haochen and Xu, Huazhe and Clarke, Samuel and Li, Yunzhu and Wu, Jiajun},\n journal={arXiv preprint arXiv:2306.14447},\n year={2023}\n }"]]