TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

pass

Description:

PASS is a large-scale image dataset that does not include any humans, human parts, or other personally identifiable information. It can be used for high-quality self-supervised pretraining while significantly reducing privacy concerns.

PASS contains 1,439,589 images without any labels sourced from YFCC-100M.

All images in this dataset are licenced under the CC-BY licence, as is the dataset itself. For YFCC-100M see http://www.multimediacommons.org/

Additional Documentation: Explore on Papers With Code
Homepage: https://www.robots.ox.ac.uk/~vgg/data/pass/
Source code: tfds.datasets.pass.Builder
Versions:
- 1.0.0: Initial release.
- 2.0.0: v2: Removed 472 images from v1 as they contained humans. Also added metadata: datetaken and GPS.
- 3.0.0 (default): v3: Removed 131 images from v2 as they contained humans/tattos.
Download size: 167.30 GiB
Dataset size: 166.43 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'train'`	1,439,588

Feature structure:

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'image/creator_uname': Text(shape=(), dtype=string),
    'image/date_taken': Text(shape=(), dtype=string),
    'image/gps_lat': float32,
    'image/gps_lon': float32,
    'image/hash': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
image	Image	(None, None, 3)	uint8
image/creator_uname	Text		string
image/date_taken	Text		string
image/gps_lat	Tensor		float32
image/gps_lon	Tensor		float32
image/hash	Text		string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples):

Visualization

Examples (tfds.as_dataframe):

Citation:

@Article{asano21pass,
author = "Yuki M. Asano and Christian Rupprecht and Andrew Zisserman and Andrea Vedaldi",
title = "PASS: An ImageNet replacement for self-supervised pretraining without humans",
journal = "NeurIPS Track on Datasets and Benchmarks",
year = "2021"
}