제어된_노이즈_웹_라벨

설명 :

Controlled Noisy Web Labels는 Google Cloud Data Labeling Service를 통해 3~5명의 라벨 지정 전문가가 모든 이미지에 주의 깊게 주석을 추가한 이미지에 대한 최대 212,000개의 URL 모음입니다. 이러한 주석을 사용하여 웹에서 제어된 실제 레이블 노이즈의 첫 번째 벤치마크를 설정합니다.

Red Mini-ImageNet(실제 웹 노이즈) 및 Blue Mini-ImageNet 구성을 제공합니다. - controlled_noisy_web_labels/mini_imagenet_red - controlled_noisy_web_labels/mini_imagenet_blue

각 구성에는 0%에서 80%까지 10개의 잡음 수준 p가 있는 10개의 변형이 포함되어 있습니다. 유효성 검사 세트에는 깨끗한 레이블이 있으며 모든 시끄러운 교육 세트에서 공유됩니다. 따라서 각 구성에는 다음과 같은 분할이 있습니다.

train_00
기차_05
train_10
기차_15
기차_20
기차_30
기차_40
기차_50
기차_60
train_80
확인

데이터 세트 구성 및 분석에 대한 자세한 내용은 백서에서 확인할 수 있습니다. 모든 이미지는 84x84 해상도로 크기가 조정됩니다.

홈페이지 : https://google.github.io/controlled-noisy-web-labels/index.html
소스 코드 : tfds.image_classification.controlled_noisy_web_labels.ControlledNoisyWebLabels
버전 :
- 1.0.0 (기본값): 최초 릴리스.
다운로드 크기 : 1.83 MiB
수동 다운로드 지침 : 이 데이터 세트는 원본 데이터를 download_config.manual_dir에 수동으로 download_config.manual_dir 해야 합니다(기본값은 ~/tensorflow_datasets/downloads/manual/ ).
이 데이터를 수동으로 다운로드하려면 사용자가 다음 작업을 수행해야 합니다.

여기 에서 분할 및 주석을 다운로드하십시오.
dataset_no_images.zip을 dataset_no_images/로 추출합니다.
dataset_no_images/mini-imagenet-annotations.json의 모든 이미지를 dataset_no_images/noisy_images/라는 새 폴더로 다운로드합니다. 출력 파일 이름은 mini-imagenet-annotations.json에 제공된 이미지 ID와 일치해야 합니다. 예를 들어 "image/id": "5922767e5677aef4"인 경우 다운로드한 이미지는 dataset_no_images/noisy_images/5922767e5677aef4.jpg여야 합니다. 4.https://image-net.org/download-images에 등록하고 ILSVRC2012_img_train.tar 및 ILSVRC2012_img_val.tar를 다운로드합니다.

결과 디렉터리 구조는 TFDS에서 처리할 수 있습니다.

dataset_no_images/
- 미니 이미지넷/
- class_name.txt
- 나뉘다/
  - blue_noise_nl_0.0
  - blue_noise_nl_0.1
  - ...
  - red_noise_nl_0.0
  - red_noise_nl_0.1
  - ...
  - clean_validation
- mini-imagenet-annotations.json
ILSVRC2012_img_train.tar
ILSVRC2012_img_val.tar
noise_images/
- 5922767e5677aef4.jpg
자동 캐시 ( 문서 ): 아니요
기능 구조 :

FeaturesDict({
    'id': Text(shape=(), dtype=string),
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'is_clean': bool,
    'label': ClassLabel(shape=(), dtype=int64, num_classes=100),
})

기능 문서 :

특징	수업	모양	D타입
	풍모Dict
ID	텍스트		끈
영상	영상	(없음, 없음, 3)	uint8
is_clean	텐서		부울
상표	클래스 레이블		int64

감독 키 ( as_supervised 문서 참조): ('image', 'label')
인용 :

@inproceedings{jiang2020beyond,
  title={Beyond synthetic noise: Deep learning on controlled noisy labels},
  author={Jiang, Lu and Huang, Di and Liu, Mason and Yang, Weilong},
  booktitle={International Conference on Machine Learning},
  pages={4804--4815},
  year={2020},
  organization={PMLR}
}

controlled_noisy_web_labels/mini_imagenet_red(기본 구성)

데이터세트 크기 : 1.19 GiB
분할 :

나뉘다	예
`'train_00'`	50,000
`'train_05'`	50,000
`'train_10'`	50,000
`'train_15'`	50,000
`'train_20'`	50,000
`'train_30'`	49,985
`'train_40'`	50,010
`'train_50'`	49,962
`'train_60'`	50,000
`'train_80'`	50,008
`'validation'`	5,000

그림 ( tfds.show_examples ):

예 ( tfds.as_dataframe ):

controlled_noisy_web_labels/mini_imagenet_blue

데이터세트 크기 : 1.39 GiB
분할 :

나뉘다	예
`'train_00'`	60,000
`'train_05'`	60,000
`'train_10'`	60,000
`'train_15'`	60,000
`'train_20'`	60,000
`'train_30'`	60,000
`'train_40'`	60,000
`'train_50'`	60,000
`'train_60'`	60,000
`'train_80'`	60,000
`'validation'`	5,000

그림 ( tfds.show_examples ):

예 ( tfds.as_dataframe ):