s3o4d

The dataset first described in the "Stanford 3D Objects" section of the paper Disentangling by Subspace Diffusion. The data consists of 100,000 renderings each of the Bunny and Dragon objects from the Stanford 3D Scanning Repository. More objects may be added in the future, but only the Bunny and Dragon are used in the paper. Each object is rendered with a uniformly sampled illumination from a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent states are provided as NumPy arrays along with the images. The lighting is given as a 3-vector with unit norm, while the rotation is provided both as a quaternion and a 3x3 orthogonal matrix.

There are many similarities between S3O4D and existing ML benchmark datasets like NORB, 3D Chairs, 3D Shapes and many others, which also include renderings of a set of objects under different pose and illumination conditions. However, none of these existing datasets include the full manifold of rotations in 3D - most include only a subset of changes to elevation and azimuth. S3O4D images are sampled uniformly and independently from the full space of rotations and illuminations, meaning the dataset contains objects that are upside down and illuminated from behind or underneath. We believe that this makes S3O4D uniquely suited for research on generative models where the latent space has non-trivial topology, as well as for general manifold learning methods where the curvature of the manifold is important.

Split Examples
'bunny_test' 20,000
'bunny_train' 80,000
'dragon_test' 20,000
'dragon_train' 80,000
  • Feature structure:
FeaturesDict({
    'illumination': Tensor(shape=(3,), dtype=float32),
    'image': Image(shape=(256, 256, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'pose_mat': Tensor(shape=(3, 3), dtype=float32),
    'pose_quat': Tensor(shape=(4,), dtype=float32),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
illumination Tensor (3,) float32
image Image (256, 256, 3) uint8
label ClassLabel int64
pose_mat Tensor (3, 3) float32
pose_quat Tensor (4,) float32

Visualization

  • Citation:
@article{pfau2020disentangling,
  title={Disentangling by Subspace Diffusion},
  author={Pfau, David and Higgins, Irina and Botev, Aleksandar and Racani\`ere,
  S{\'e}bastian},
  journal={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}