- Description:
Imagenet2012Subset is a subset of original ImageNet ILSVRC 2012 dataset. The
dataset share the same validation set as the original ImageNet ILSVRC 2012
dataset. However, the training set is subsampled in a label balanced fashion. In
1pct
configuration, 1%, or 12811, images are sampled, most classes have the
same number of images (average 12.8), some classes randomly have 1 more example
than others; and in 10pct
configuration, ~10%, or 128116, most classes have
the same number of images (average 128), and some classes randomly have 1 more
example than others.
This is supposed to be used as a benchmark for semi-supervised learning, and has been originally used in SimCLR paper (https://arxiv.org/abs/2002.05709).
Homepage: http://image-net.org/
Source code:
tfds.datasets.imagenet2012_subset.Builder
Versions:
2.0.0
: Fix validation labels.2.0.1
: Encoding fix. No changes from user point of view.3.0.0
: Fix colorization on ~12 images (CMYK -> RGB). Fix format for consistency (convert the single png image to Jpeg). Faster generation reading directly from the archive.4.0.0
: (unpublished)5.0.0
(default): New split API (https://tensorflow.org/datasets/splits)5.1.0
: Added test split.
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
manual_dir should contain two files: ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar. You need to register on https://image-net.org/download-images in order to get the link to download the dataset.Auto-cached (documentation): No
Feature structure:
FeaturesDict({
'file_name': Text(shape=(), dtype=string),
'image': Image(shape=(None, None, 3), dtype=uint8),
'label': ClassLabel(shape=(), dtype=int64, num_classes=1000),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
file_name | Text | string | ||
image | Image | (None, None, 3) | uint8 | |
label | ClassLabel | int64 |
Supervised keys (See
as_supervised
doc):('image', 'label')
Citation:
@article{chen2020simple,
title={A Simple Framework for Contrastive Learning of Visual Representations},
author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey},
journal={arXiv preprint arXiv:2002.05709},
year={2020}
}
@article{ILSVRC15,
Author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
Title = { {ImageNet Large Scale Visual Recognition Challenge} },
Year = {2015},
journal = {International Journal of Computer Vision (IJCV)},
doi = {10.1007/s11263-015-0816-y},
volume={115},
number={3},
pages={211-252}
}
imagenet2012_subset/1pct (default config)
Config description: 1pct of total ImageNet training set.
Download size:
254.22 KiB
Dataset size:
7.61 GiB
Splits:
Split | Examples |
---|---|
'train' |
12,811 |
'validation' |
50,000 |
- Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
imagenet2012_subset/10pct
Config description: 10pct of total ImageNet training set.
Download size:
2.48 MiB
Dataset size:
19.91 GiB
Splits:
Split | Examples |
---|---|
'train' |
128,116 |
'validation' |
50,000 |
- Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):