- Description:
ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.
The test split contains 100K images but no labels because no labels have been publicly released. We provide support for the test split from 2012 with the minor patch released on October 10, 2019. In order to manually download this data, a user must perform the following operations:
- Download the 2012 test split available here.
- Download the October 10, 2019 patch. There is a Google Drive link to the patch provided on the same page.
- Combine the two tar-balls, manually overwriting any images in the original archive with images from the patch. According to the instructions on image-net.org, this procedure overwrites just a few images.
The resulting tar-ball may then be processed by TFDS.
To assess the accuracy of a model on the ImageNet test split, one must run inference on all images in the split, export those results to a text file that must be uploaded to the ImageNet evaluation server. The maintainers of the ImageNet evaluation server permits a single user to submit up to 2 submissions per week in order to prevent overfitting.
To evaluate the accuracy on the test split, one must first create an account at image-net.org. This account must be approved by the site administrator. After the account is created, one can submit the results to the test server at https://image-net.org/challenges/LSVRC/eval_server.php The submission consists of several ASCII text files corresponding to multiple tasks. The task of interest is "Classification submission (top-5 cls error)". A sample of an exported text file looks like the following:
771 778 794 387 650
363 691 764 923 427
737 369 430 531 124
755 930 755 59 168
The export format is described in full in "readme.txt" within the 2013 development kit available here: https://image-net.org/data/ILSVRC/2013/ILSVRC2013_devkit.tgz Please see the section entitled "3.3 CLS-LOC submission format". Briefly, the format of the text file is 100,000 lines corresponding to each image in the test split. Each line of integers correspond to the rank-ordered, top 5 predictions for each test image. The integers are 1-indexed corresponding to the line number in the corresponding labels file. See labels.txt.
- Additional Documentation: Explore on Papers With Code 
- Homepage: https://image-net.org/ 
- Source code: - tfds.datasets.imagenet2012.Builder
- Versions: - 2.0.0: Fix validation labels.
- 2.0.1: Encoding fix. No changes from user point of view.
- 3.0.0: Fix colorization on ~12 images (CMYK -> RGB). Fix format for consistency (convert the single png image to Jpeg). Faster generation reading directly from the archive.
- 4.0.0: (unpublished)
- 5.0.0: New split API (https://tensorflow.org/datasets/splits)
- 5.1.0(default): Added test split.
 
- Download size: - Unknown size
- Dataset size: - 155.84 GiB
- Manual download instructions: This dataset requires you to download the source data manually into - download_config.manual_dir(defaults to- ~/tensorflow_datasets/downloads/manual/):
 manual_dir should contain two files: ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar. You need to register on https://image-net.org/download-images in order to get the link to download the dataset.
- Auto-cached (documentation): No 
- Splits: 
| Split | Examples | 
|---|---|
| 'test' | 100,000 | 
| 'train' | 1,281,167 | 
| 'validation' | 50,000 | 
- Feature structure:
FeaturesDict({
    'file_name': Text(shape=(), dtype=string),
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=1000),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description | 
|---|---|---|---|---|
| FeaturesDict | ||||
| file_name | Text | string | ||
| image | Image | (None, None, 3) | uint8 | |
| label | ClassLabel | int64 | 
- Supervised keys (See - as_superviseddoc):- ('image', 'label')
- Figure (tfds.show_examples): 

- Examples (tfds.as_dataframe):
- Citation:
@article{ILSVRC15,
Author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
Title = { {ImageNet Large Scale Visual Recognition Challenge} },
Year = {2015},
journal   = {International Journal of Computer Vision (IJCV)},
doi = {10.1007/s11263-015-0816-y},
volume={115},
number={3},
pages={211-252}
}