Stay organized with collections
Save and categorize content based on your preferences.
Description:
Controlled Noisy Web Labels is a collection of ~212,000 URLs to images in which
every image is carefully annotated by 3-5 labeling professionals by Google Cloud
Data Labeling Service. Using these annotations, it establishes the first
benchmark of controlled real-world label noise from the web.
We provide the Red Mini-ImageNet (real-world web noise) and Blue Mini-ImageNet
configs: - controlled_noisy_web_labels/mini_imagenet_red -
controlled_noisy_web_labels/mini_imagenet_blue
Each config contains ten variants with ten noise-levels p from 0% to 80%. The
validation set has clean labels and is shared across all noisy training sets.
Therefore, each config has the following splits:
train_00
train_05
train_10
train_15
train_20
train_30
train_40
train_50
train_60
train_80
validation
The details for dataset construction and analysis can be found in the paper. All
images are resized to 84x84 resolution.
Manual download instructions: This dataset requires you to
download the source data manually into download_config.manual_dir
(defaults to ~/tensorflow_datasets/downloads/manual/):
In order to manually download this data, a user must perform the
following operations:
Extract dataset_no_images.zip to dataset_no_images/.
Download all images in dataset_no_images/mini-imagenet-annotations.json into
a new folder named dataset_no_images/noisy_images/. The output filename must
agree with the image id provided in mini-imagenet-annotations.json. For
example, if "image/id": "5922767e5677aef4", then the downloaded image should
be dataset_no_images/noisy_images/5922767e5677aef4.jpg. 4.Register on
https://image-net.org/download-images and download ILSVRC2012_img_train.tar
and ILSVRC2012_img_val.tar.
The resulting directory structure may then be processed by TFDS:
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# controlled_noisy_web_labels\n\n\u003cbr /\u003e\n\n| **Warning:** Manual download required. See instructions below.\n\n- **Description**:\n\nControlled Noisy Web Labels is a collection of \\~212,000 URLs to images in which\nevery image is carefully annotated by 3-5 labeling professionals by Google Cloud\nData Labeling Service. Using these annotations, it establishes the first\nbenchmark of controlled real-world label noise from the web.\n\nWe provide the Red Mini-ImageNet (real-world web noise) and Blue Mini-ImageNet\nconfigs: - controlled_noisy_web_labels/mini_imagenet_red -\ncontrolled_noisy_web_labels/mini_imagenet_blue\n\nEach config contains ten variants with ten noise-levels p from 0% to 80%. The\nvalidation set has clean labels and is shared across all noisy training sets.\nTherefore, each config has the following splits:\n\n- train_00\n- train_05\n- train_10\n- train_15\n- train_20\n- train_30\n- train_40\n- train_50\n- train_60\n- train_80\n- validation\n\nThe details for dataset construction and analysis can be found in the paper. All\nimages are resized to 84x84 resolution.\n\n- **Homepage** :\n \u003chttps://google.github.io/controlled-noisy-web-labels/index.html\u003e\n\n- **Source code** :\n [`tfds.image_classification.controlled_noisy_web_labels.ControlledNoisyWebLabels`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/image_classification/controlled_noisy_web_labels/controlled_noisy_web_labels.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `1.83 MiB`\n\n- **Manual download instructions** : This dataset requires you to\n download the source data manually into `download_config.manual_dir`\n (defaults to `~/tensorflow_datasets/downloads/manual/`): \n\n In order to manually download this data, a user must perform the\n following operations:\n\n1. Download the splits and the annotations [here](https://storage.googleapis.com/cnlw/dataset_no_images.zip)\n2. Extract dataset_no_images.zip to dataset_no_images/.\n3. Download all images in dataset_no_images/mini-imagenet-annotations.json into a new folder named dataset_no_images/noisy_images/. The output filename must agree with the image id provided in mini-imagenet-annotations.json. For example, if \"image/id\": \"5922767e5677aef4\", then the downloaded image should be dataset_no_images/noisy_images/5922767e5677aef4.jpg. 4.Register on \u003chttps://image-net.org/download-images\u003e and download ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar.\n\nThe resulting directory structure may then be processed by TFDS:\n\n- dataset_no_images/\n - mini-imagenet/\n - class_name.txt\n - split/\n - blue_noise_nl_0.0\n - blue_noise_nl_0.1\n - ...\n - red_noise_nl_0.0\n - red_noise_nl_0.1\n - ...\n - clean_validation\n - mini-imagenet-annotations.json\n- ILSVRC2012_img_train.tar\n- ILSVRC2012_img_val.tar\n- noisy_images/\n\n - 5922767e5677aef4.jpg\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'id': Text(shape=(), dtype=string),\n 'image': Image(shape=(None, None, 3), dtype=uint8),\n 'is_clean': bool,\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=100),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------|--------------|-----------------|--------|-------------|\n| | FeaturesDict | | | |\n| id | Text | | string | |\n| image | Image | (None, None, 3) | uint8 | |\n| is_clean | Tensor | | bool | |\n| label | ClassLabel | | int64 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('image', 'label')`\n\n- **Citation**:\n\n @inproceedings{jiang2020beyond,\n title={Beyond synthetic noise: Deep learning on controlled noisy labels},\n author={Jiang, Lu and Huang, Di and Liu, Mason and Yang, Weilong},\n booktitle={International Conference on Machine Learning},\n pages={4804--4815},\n year={2020},\n organization={PMLR}\n }\n\ncontrolled_noisy_web_labels/mini_imagenet_red (default config)\n--------------------------------------------------------------\n\n- **Dataset size** : `1.19 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'train_00'` | 50,000 |\n| `'train_05'` | 50,000 |\n| `'train_10'` | 50,000 |\n| `'train_15'` | 50,000 |\n| `'train_20'` | 50,000 |\n| `'train_30'` | 49,985 |\n| `'train_40'` | 50,010 |\n| `'train_50'` | 49,962 |\n| `'train_60'` | 50,000 |\n| `'train_80'` | 50,008 |\n| `'validation'` | 5,000 |\n\n- **Figure** ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\ncontrolled_noisy_web_labels/mini_imagenet_blue\n----------------------------------------------\n\n- **Dataset size** : `1.39 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'train_00'` | 60,000 |\n| `'train_05'` | 60,000 |\n| `'train_10'` | 60,000 |\n| `'train_15'` | 60,000 |\n| `'train_20'` | 60,000 |\n| `'train_30'` | 60,000 |\n| `'train_40'` | 60,000 |\n| `'train_50'` | 60,000 |\n| `'train_60'` | 60,000 |\n| `'train_80'` | 60,000 |\n| `'validation'` | 5,000 |\n\n- **Figure** ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]