TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

s3o4d

Description:

The dataset first described in the "Stanford 3D Objects" section of the paper Disentangling by Subspace Diffusion. The data consists of 100,000 renderings each of the Bunny and Dragon objects from the Stanford 3D Scanning Repository. More objects may be added in the future, but only the Bunny and Dragon are used in the paper. Each object is rendered with a uniformly sampled illumination from a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent states are provided as NumPy arrays along with the images. The lighting is given as a 3-vector with unit norm, while the rotation is provided both as a quaternion and a 3x3 orthogonal matrix.

There are many similarities between S3O4D and existing ML benchmark datasets like NORB, 3D Chairs, 3D Shapes and many others, which also include renderings of a set of objects under different pose and illumination conditions. However, none of these existing datasets include the full manifold of rotations in 3D - most include only a subset of changes to elevation and azimuth. S3O4D images are sampled uniformly and independently from the full space of rotations and illuminations, meaning the dataset contains objects that are upside down and illuminated from behind or underneath. We believe that this makes S3O4D uniquely suited for research on generative models where the latent space has non-trivial topology, as well as for general manifold learning methods where the curvature of the manifold is important.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/deepmind/deepmind-research/tree/master/geomancer#stanford-3d-objects-for-disentangling-s3o4d
Source code: tfds.datasets.s3o4d.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 911.68 MiB
Dataset size: 1.01 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'bunny_test'`	20,000
`'bunny_train'`	80,000
`'dragon_test'`	20,000
`'dragon_train'`	80,000

Feature structure:

FeaturesDict({
    'illumination': Tensor(shape=(3,), dtype=float32),
    'image': Image(shape=(256, 256, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'pose_mat': Tensor(shape=(3, 3), dtype=float32),
    'pose_quat': Tensor(shape=(4,), dtype=float32),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
illumination	Tensor	(3,)	float32
image	Image	(256, 256, 3)	uint8
label	ClassLabel		int64
pose_mat	Tensor	(3, 3)	float32
pose_quat	Tensor	(4,)	float32

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples):

Visualization

Examples (tfds.as_dataframe):

Citation:

@article{pfau2020disentangling,
  title={Disentangling by Subspace Diffusion},
  author={Pfau, David and Higgins, Irina and Botev, Aleksandar and Racani\`ere,
  S{\'e}bastian},
  journal={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}