라온400m

설명 :

LAION-400M 데이터세트는 완전히 공개적이고 자유롭게 접근 가능합니다.

이 데이터 세트에 대한 전체 설명은 https://laion.ai/laion-400-open-dataset/를 확인하세요.

LAION-400M 데이터 세트의 모든 이미지와 텍스트는 텍스트와 이미지 임베딩 간의 코사인 유사성을 계산하고 유사성이 0.3 미만인 것을 삭제하여 OpenAI의 CLIP으로 필터링되었습니다. 0.3이라는 임계값은 사람의 평가를 통해 결정되었으며 의미론적 이미지-텍스트-내용 일치를 추정하는 데 좋은 휴리스틱인 것처럼 보였습니다.

이미지-텍스트 쌍은 Common Crawl 웹 데이터 덤프에서 추출되었으며 2014년부터 2021년 사이에 크롤링된 임의의 웹 페이지에서 추출되었습니다.

추가 문서 : 코드 가 포함된 논문 탐색
홈페이지 : https://laion.ai/blog/laion-400-open-dataset/
소스코드 : tfds.vision_language.laion400m.Laion400m
버전 :
- 1.0.0 (기본값): 최초 릴리스입니다.
다운로드 크기 : Unknown size
데이터 세트 크기 : Unknown size
수동 다운로드 지침 : 이 데이터세트에서는 소스 데이터를 download_config.manual_dir 에 수동으로 다운로드해야 합니다(기본값은 ~/tensorflow_datasets/downloads/manual/ ).
https://laion.ai/blog/laion-400-open-dataset/ 의 "다운로드 정보" 섹션을 참조하세요.
자동 캐시 ( 문서 ): 알 수 없음
분할 :

나뉘다	예

감독되는 키 ( as_supervised doc 참조): None
그림 ( tfds.show_examples ): 지원되지 않습니다.
예 ( tfds.as_dataframe ): 누락되었습니다.
인용 :

@article{DBLP:journals/corr/abs-2111-02114,
  author    = {Christoph Schuhmann and
               Richard Vencu and
               Romain Beaumont and
               Robert Kaczmarczyk and
               Clayton Mullis and
               Aarush Katta and
               Theo Coombes and
               Jenia Jitsev and
               Aran Komatsuzaki},
  title     = { {LAION-400M:} Open Dataset of CLIP-Filtered 400 Million Image-Text
               Pairs},
  journal   = {CoRR},
  volume    = {abs/2111.02114},
  year      = {2021},
  url       = {https://arxiv.org/abs/2111.02114},
  eprinttype = {arXiv},
  eprint    = {2111.02114},
  timestamp = {Fri, 05 Nov 2021 15:25:54 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-02114.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

laion400m/images (기본 구성)

기능 구조 :

FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'image': Image(shape=(None, None, 3), dtype=uint8, description=image),
    'license': Text(shape=(), dtype=string),
    'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
    'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
    'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
    'url': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	모양	Dtype	설명	값 범위
	특징Dict
표제	텍스트		끈	HTML 대체 텍스트 속성
영상	영상	(없음, 없음, 3)	uint8	영상
특허	텍스트		끈	크리에이티브 커먼즈 라이선스 유형(해당되는 경우)
NSFW	클래스 라벨		정수64	NSFW 태그(CLIP으로 감지됨). 일관성이 없고 누락된 태그는 UNTAGGED로 대체됩니다.
원본_높이	스칼라		정수32	이미지의 원래 높이
원본_너비	스칼라		정수32	이미지의 원래 너비
유사	스칼라		float64	텍스트 임베딩과 이미지 임베딩 간의 코사인 유사성 점수. 누락된 값의 기본값은 -1.0입니다.	[0.0, 1.0]
URL	텍스트		끈	이미지 URL

laion400m/임베딩

기능 구조 :

FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'image_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP image embedding),
    'license': Text(shape=(), dtype=string),
    'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
    'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
    'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
    'text_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP text embedding),
    'url': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	모양	Dtype	설명	값 범위
	특징Dict
표제	텍스트		끈	HTML 대체 텍스트 속성
이미지_임베딩	텐서	(512,)	플로트16	CLIP 이미지 삽입
특허	텍스트		끈	크리에이티브 커먼즈 라이선스 유형(해당되는 경우)
NSFW	클래스 라벨		정수64	NSFW 태그(CLIP으로 감지됨). 일관성이 없고 누락된 태그는 UNTAGGED로 대체됩니다.
원본_높이	스칼라		정수32	이미지의 원래 높이
원본_너비	스칼라		정수32	이미지의 원래 너비
유사	스칼라		float64	텍스트 임베딩과 이미지 임베딩 간의 코사인 유사성 점수. 누락된 값의 기본값은 -1.0입니다.	[0.0, 1.0]
텍스트 삽입	텐서	(512,)	float16	CLIP 텍스트 삽입
URL	텍스트		끈	이미지 URL

라온400m 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

laion400m/images (기본 구성)

laion400m/임베딩

라온400m