tff.simulation.datasets.gldv2.load_data
Stay organized with collections
Save and categorize content based on your preferences.
Loads a federated version of the Google Landmark v2 dataset.
tff.simulation.datasets.gldv2.load_data(
num_worker: int = 1,
cache_dir: str = 'cache',
gld23k: bool = False,
base_url: str = GLD_SHARD_BASE_URL
)
The dataset consists of photos of various world landmarks, with images
grouped by photographer to achieve a federated partitioning of the data.
The dataset is downloaded and cached locally. If previously downloaded, it
tries to load the dataset from cache.
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values:
'image/decoded'
: A tf.Tensor
with dtype=tf.uint8
that
corresponds to the pixels of the landmark images.
'class'
: A tf.Tensor
with dtype=tf.int64
and shape [1],
corresponding to the class label of the landmark ([0, 203) for gld23k,
[0, 2028) for gld160k).
Two flavors of GLD datasets are available. When gld23k is true, a minimum
version of the federated Google landmark dataset will be provided for faster
iterations. The gld23k dataset contains 203 classes, 233 clients and 23080
images. When gld23k is false, the gld160k dataset
(https://arxiv.org/abs/2003.08082) will be provided. The gld160k dataset
contains 2,028 classes, 1262 clients and 164,172 images.
Args |
num_worker
|
(Optional) The number of threads for downloading the GLD v2
dataset.
|
cache_dir
|
(Optional) The directory to cache the downloaded file. If None ,
caches in Keras' default cache directory.
|
gld23k
|
(Optional) When true, a smaller version of the federated Google
Landmark v2 dataset will be loaded. This gld23k dataset is used for faster
prototyping.
|
base_url
|
(Optional) The base url to download GLD v2 image shards.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[null,null,["Last updated 2024-09-20 UTC."],[],[],null,["# tff.simulation.datasets.gldv2.load_data\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/federated/blob/v0.87.0 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the) |\n\nLoads a federated version of the Google Landmark v2 dataset. \n\n tff.simulation.datasets.gldv2.load_data(\n num_worker: int = 1,\n cache_dir: str = 'cache',\n gld23k: bool = False,\n base_url: str = GLD_SHARD_BASE_URL\n )\n\nThe dataset consists of photos of various world landmarks, with images\ngrouped by photographer to achieve a federated partitioning of the data.\nThe dataset is downloaded and cached locally. If previously downloaded, it\ntries to load the dataset from cache.\n\nThe `tf.data.Datasets` returned by\n[`tff.simulation.datasets.ClientData.create_tf_dataset_for_client`](../../../../tff/simulation/datasets/ClientData#create_tf_dataset_for_client) will yield\n`collections.OrderedDict` objects at each iteration, with the following keys\nand values:\n\n- `'image/decoded'`: A [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.uint8` that corresponds to the pixels of the landmark images.\n- `'class'`: A [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.int64` and shape \\[1\\], corresponding to the class label of the landmark (\\[0, 203) for gld23k, \\[0, 2028) for gld160k).\n\nTwo flavors of GLD datasets are available. When gld23k is true, a minimum\nversion of the federated Google landmark dataset will be provided for faster\niterations. The gld23k dataset contains 203 classes, 233 clients and 23080\nimages. When gld23k is false, the gld160k dataset\n(\u003chttps://arxiv.org/abs/2003.08082\u003e) will be provided. The gld160k dataset\ncontains 2,028 classes, 1262 clients and 164,172 images.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `num_worker` | (Optional) The number of threads for downloading the GLD v2 dataset. |\n| `cache_dir` | (Optional) The directory to cache the downloaded file. If `None`, caches in Keras' default cache directory. |\n| `gld23k` | (Optional) When true, a smaller version of the federated Google Landmark v2 dataset will be loaded. This gld23k dataset is used for faster prototyping. |\n| `base_url` | (Optional) The base url to download GLD v2 image shards. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| Tuple of (train, test) where the tuple elements are a [`tff.simulation.datasets.ClientData`](../../../../tff/simulation/datasets/ClientData) and a [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). ||\n\n\u003cbr /\u003e"]]