ライオン400m

説明：

LAION-400M データセットは完全にオープンで自由にアクセスできます。

このデータセットの完全な説明については、 https://laion.ai/laion-400-open-dataset/を確認してください。

LAION-400M データセット内のすべての画像とテキストは、テキストと画像の埋め込み間のコサイン類似度を計算し、類似度が 0.3 未満のものを削除することにより、OpenAI の CLIP でフィルタリングされています。 0.3 というしきい値は人間による評価によって決定されており、意味論的な画像、テキスト、コンテンツの一致を推定するための優れたヒューリスティックであると思われます。

画像とテキストのペアは Common Crawl Web データダンプから抽出されたもので、2014 年から 2021 年の間にクロールされたランダムな Web ページからのものです。

追加ドキュメント:コード付きの論文について調べる
ホームページ: https://laion.ai/blog/laion-400-open-dataset/
ソースコード: tfds.vision_language.laion400m.Laion400m
バージョン:
- 1.0.0 (デフォルト): 初期リリース。
ダウンロードサイズ: Unknown size
データセットのサイズ: Unknown size
手動ダウンロード手順: このデータセットでは、ソースデータをdownload_config.manual_dirに手動でダウンロードする必要があります (デフォルトは~/tensorflow_datasets/downloads/manual/ )。
https://laion.ai/blog/laion-400-open-dataset/の「ダウンロード情報」セクションを参照してください。
自動キャッシュ(ドキュメント): 不明
分割:

スプリット	例

監視キー( as_supervised docを参照): None
図( tfds.show_examples ): サポートされていません。
例( tfds.as_dataframe ): 欠落しています。
引用：

@article{DBLP:journals/corr/abs-2111-02114,
  author    = {Christoph Schuhmann and
               Richard Vencu and
               Romain Beaumont and
               Robert Kaczmarczyk and
               Clayton Mullis and
               Aarush Katta and
               Theo Coombes and
               Jenia Jitsev and
               Aran Komatsuzaki},
  title     = { {LAION-400M:} Open Dataset of CLIP-Filtered 400 Million Image-Text
               Pairs},
  journal   = {CoRR},
  volume    = {abs/2111.02114},
  year      = {2021},
  url       = {https://arxiv.org/abs/2111.02114},
  eprinttype = {arXiv},
  eprint    = {2111.02114},
  timestamp = {Fri, 05 Nov 2021 15:25:54 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-02114.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

laion400m/images (デフォルト設定)

機能の構造:

FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'image': Image(shape=(None, None, 3), dtype=uint8, description=image),
    'license': Text(shape=(), dtype=string),
    'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
    'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
    'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
    'url': Text(shape=(), dtype=string),
})

機能ドキュメント:

特徴	クラス	形	Dタイプ	説明	値の範囲
	特徴辞書
キャプション	文章		弦	HTML の代替テキスト属性
画像	画像	(なし、なし、3)	uint8	画像
ライセンス	文章		弦	クリエイティブコモンズライセンスの種類 (該当する場合)
NSFW	クラスラベル		int64	NSFW タグ (CLIP で検出)。まとまりのないタグや欠落しているタグは UNTAGGED に置き換えられます
元の高さ	スカラー		int32	画像の元の高さ
元の幅	スカラー		int32	画像の元の幅
類似性	スカラー		float64	テキストと画像の埋め込み間のコサイン類似性スコア。欠損値のデフォルトは -1.0	[0.0、1.0]
URL	文章		弦	画像のURL

laion400m/埋め込み

機能の構造:

FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'image_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP image embedding),
    'license': Text(shape=(), dtype=string),
    'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
    'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
    'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
    'text_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP text embedding),
    'url': Text(shape=(), dtype=string),
})

機能ドキュメント:

特徴	クラス	形	Dタイプ	説明	値の範囲
	特徴辞書
キャプション	文章		弦	HTML の代替テキスト属性
画像の埋め込み	テンソル	(512,)	float16	CLIP画像の埋め込み
ライセンス	文章		弦	クリエイティブコモンズライセンスの種類 (該当する場合)
NSFW	クラスラベル		int64	NSFW タグ (CLIP で検出)。まとまりのないタグや欠落しているタグは UNTAGGED に置き換えられます
元の高さ	スカラー		int32	画像の元の高さ
元の幅	スカラー		int32	画像の元の幅
類似性	スカラー		float64	テキストと画像の埋め込み間のコサイン類似性スコア。欠損値のデフォルトは -1.0	[0.0、1.0]
テキスト埋め込み	テンソル	(512,)	float16	CLIPテキストの埋め込み
URL	文章		弦	画像のURL

ライオン400m コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

laion400m/images (デフォルト設定)

laion400m/埋め込み

ライオン400m