@inproceedings{jiang2020beyond,
title={Beyond synthetic noise: Deep learning on controlled noisy labels},
author={Jiang, Lu and Huang, Di and Liu, Mason and Yang, Weilong},
booktitle={International Conference on Machine Learning},
pages={4804--4815},
year={2020},
organization={PMLR}
}
[null,null,["最終更新日 2022-12-06 UTC。"],[],[],null,["# controlled_noisy_web_labels\n\n\u003cbr /\u003e\n\n| **Warning:** Manual download required. See instructions below.\n\n- **Description**:\n\nControlled Noisy Web Labels is a collection of \\~212,000 URLs to images in which\nevery image is carefully annotated by 3-5 labeling professionals by Google Cloud\nData Labeling Service. Using these annotations, it establishes the first\nbenchmark of controlled real-world label noise from the web.\n\nWe provide the Red Mini-ImageNet (real-world web noise) and Blue Mini-ImageNet\nconfigs: - controlled_noisy_web_labels/mini_imagenet_red -\ncontrolled_noisy_web_labels/mini_imagenet_blue\n\nEach config contains ten variants with ten noise-levels p from 0% to 80%. The\nvalidation set has clean labels and is shared across all noisy training sets.\nTherefore, each config has the following splits:\n\n- train_00\n- train_05\n- train_10\n- train_15\n- train_20\n- train_30\n- train_40\n- train_50\n- train_60\n- train_80\n- validation\n\nThe details for dataset construction and analysis can be found in the paper. All\nimages are resized to 84x84 resolution.\n\n- **Homepage** :\n \u003chttps://google.github.io/controlled-noisy-web-labels/index.html\u003e\n\n- **Source code** :\n [`tfds.image_classification.controlled_noisy_web_labels.ControlledNoisyWebLabels`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/image_classification/controlled_noisy_web_labels/controlled_noisy_web_labels.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `1.83 MiB`\n\n- **Manual download instructions** : This dataset requires you to\n download the source data manually into `download_config.manual_dir`\n (defaults to `~/tensorflow_datasets/downloads/manual/`): \n\n In order to manually download this data, a user must perform the\n following operations:\n\n1. Download the splits and the annotations [here](https://storage.googleapis.com/cnlw/dataset_no_images.zip)\n2. Extract dataset_no_images.zip to dataset_no_images/.\n3. Download all images in dataset_no_images/mini-imagenet-annotations.json into a new folder named dataset_no_images/noisy_images/. The output filename must agree with the image id provided in mini-imagenet-annotations.json. For example, if \"image/id\": \"5922767e5677aef4\", then the downloaded image should be dataset_no_images/noisy_images/5922767e5677aef4.jpg. 4.Register on \u003chttps://image-net.org/download-images\u003e and download ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar.\n\nThe resulting directory structure may then be processed by TFDS:\n\n- dataset_no_images/\n - mini-imagenet/\n - class_name.txt\n - split/\n - blue_noise_nl_0.0\n - blue_noise_nl_0.1\n - ...\n - red_noise_nl_0.0\n - red_noise_nl_0.1\n - ...\n - clean_validation\n - mini-imagenet-annotations.json\n- ILSVRC2012_img_train.tar\n- ILSVRC2012_img_val.tar\n- noisy_images/\n\n - 5922767e5677aef4.jpg\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'id': Text(shape=(), dtype=string),\n 'image': Image(shape=(None, None, 3), dtype=uint8),\n 'is_clean': bool,\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=100),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------|--------------|-----------------|--------|-------------|\n| | FeaturesDict | | | |\n| id | Text | | string | |\n| image | Image | (None, None, 3) | uint8 | |\n| is_clean | Tensor | | bool | |\n| label | ClassLabel | | int64 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('image', 'label')`\n\n- **Citation**:\n\n @inproceedings{jiang2020beyond,\n title={Beyond synthetic noise: Deep learning on controlled noisy labels},\n author={Jiang, Lu and Huang, Di and Liu, Mason and Yang, Weilong},\n booktitle={International Conference on Machine Learning},\n pages={4804--4815},\n year={2020},\n organization={PMLR}\n }\n\ncontrolled_noisy_web_labels/mini_imagenet_red (default config)\n--------------------------------------------------------------\n\n- **Dataset size** : `1.19 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'train_00'` | 50,000 |\n| `'train_05'` | 50,000 |\n| `'train_10'` | 50,000 |\n| `'train_15'` | 50,000 |\n| `'train_20'` | 50,000 |\n| `'train_30'` | 49,985 |\n| `'train_40'` | 50,010 |\n| `'train_50'` | 49,962 |\n| `'train_60'` | 50,000 |\n| `'train_80'` | 50,008 |\n| `'validation'` | 5,000 |\n\n- **Figure** ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\ncontrolled_noisy_web_labels/mini_imagenet_blue\n----------------------------------------------\n\n- **Dataset size** : `1.39 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'train_00'` | 60,000 |\n| `'train_05'` | 60,000 |\n| `'train_10'` | 60,000 |\n| `'train_15'` | 60,000 |\n| `'train_20'` | 60,000 |\n| `'train_30'` | 60,000 |\n| `'train_40'` | 60,000 |\n| `'train_50'` | 60,000 |\n| `'train_60'` | 60,000 |\n| `'train_80'` | 60,000 |\n| `'validation'` | 5,000 |\n\n- **Figure** ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]