Imagenet2012Subset is a subset of original ImageNet ILSVRC 2012 dataset. The dataset share the same validation set as the original ImageNet ILSVRC 2012 dataset. However, the training set is subsampled in a label balanced fashion. In 1pct configuration, 1%, or 12811, images are sampled, most classes have the same number of images (average 12.8), some classes randomly have 1 more example than others; and in 10pct configuration, ~10%, or 128116, most classes have the same number of images (average 128), and some classes randomly have 1 more example than others.

This is supposed to be used as a benchmark for semi-supervised learning, and has been originally used in SimCLR paper (

    • 2.0.0: Fix validation labels.
    • 2.0.1: Encoding fix. No changes from user point of view.
    • 3.0.0: Fix colorization on ~12 images (CMYK -> RGB). Fix format for consistency (convert the single png image to Jpeg). Faster generation reading directly from the archive.

    • 4.0.0: (unpublished)

    • 5.0.0 (default): New split API (

    • 5.1.0: Added test split.

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
    manual_dir should contain two files: ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar. You need to register on in order to get the link to download the dataset.

  • Feature structure:

    'file_name': Text(shape=(), dtype=string),
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=1000),
  • Feature documentation:
Feature Class Shape Dtype Description
file_name Text string
image Image (None, None, 3) uint8
label ClassLabel int64
imagenet2012_subset/1pct (default config)

  • Config description: 1pct of total ImageNet training set.

  • Download size: 254.22 KiB

  • Dataset size: 7.61 GiB

  • Splits:

Split Examples
'train' 12,811
'validation' 50,000



  • Config description: 10pct of total ImageNet training set.

  • Download size: 2.48 MiB

  • Dataset size: 19.91 GiB

  • Splits:

Split Examples
'train' 128,116
'validation' 50,000