tff.simulation.datasets.cifar100.load_data
Stay organized with collections
Save and categorize content based on your preferences.
Loads a federated version of the CIFAR-100 dataset.
tff.simulation.datasets.cifar100.load_data(
cache_dir=None
)
The dataset is downloaded and cached locally. If previously downloaded, it
tries to load the dataset from cache.
The dataset is derived from the CIFAR-100
dataset. The training and
testing examples are partitioned across 500 and 100 clients (respectively).
No clients share any data samples, so it is a true partition of CIFAR-100. The
train clients have string client IDs in the range [0-499], while the test
clients have string client IDs in the range [0-99]. The train clients form a
true partition of the CIFAR-100 training split, while the test clients form a
true partition of the CIFAR-100 testing split.
The data partitioning is done using a hierarchical Latent Dirichlet Allocation
(LDA) process, referred to as the Pachinko Allocation Method (PAM).
This method uses a two-stage LDA process, where each client has an associated
multinomial distribution over the coarse labels of CIFAR-100, and a
coarse-to-fine label multinomial distribution for that coarse label over the
labels under that coarse label. The coarse label multinomial is drawn from a
symmetric Dirichlet with parameter 0.1, and each coarse-to-fine multinomial
distribution is drawn from a symmetric Dirichlet with parameter 10. Each
client has 100 samples. To generate a sample for the client, we first select
a coarse label by drawing from the coarse label multinomial distribution, and
then draw a fine label using the coarse-to-fine multinomial distribution. We
then randomly draw a sample from CIFAR-100 with that label (without
replacement). If this exhausts the set of samples with this label, we
remove the label from the coarse-to-fine multinomial and renormalize the
multinomial distribution.
Data set sizes:
- train: 50,000 examples
- test: 10,000 examples
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values, in lexicographic order by key:
'coarse_label'
: a tf.Tensor
with dtype=tf.int64
and shape [1] that
corresponds to the coarse label of the associated image. Labels are
in the range [0-19].
'image'
: a tf.Tensor
with dtype=tf.uint8
and shape [32, 32, 3],
containing the red/blue/green pixels of the image. Each pixel is a value
in the range [0, 255].
'label'
: a tf.Tensor
with dtype=tf.int64
and shape [1], the class
label of the corresponding image. Labels are in the range [0-99].
Args |
cache_dir
|
(Optional) directory to cache the downloaded file. If None ,
caches in Keras' default cache directory.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[null,null,["Last updated 2024-09-20 UTC."],[],[],null,["# tff.simulation.datasets.cifar100.load_data\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/federated/blob/v0.87.0 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the) |\n\nLoads a federated version of the CIFAR-100 dataset. \n\n tff.simulation.datasets.cifar100.load_data(\n cache_dir=None\n )\n\nThe dataset is downloaded and cached locally. If previously downloaded, it\ntries to load the dataset from cache.\n\nThe dataset is derived from the [CIFAR-100\ndataset](https://www.cs.toronto.edu/%7Ekriz/cifar.html). The training and\ntesting examples are partitioned across 500 and 100 clients (respectively).\nNo clients share any data samples, so it is a true partition of CIFAR-100. The\ntrain clients have string client IDs in the range \\[0-499\\], while the test\nclients have string client IDs in the range \\[0-99\\]. The train clients form a\ntrue partition of the CIFAR-100 training split, while the test clients form a\ntrue partition of the CIFAR-100 testing split.\n\nThe data partitioning is done using a hierarchical Latent Dirichlet Allocation\n(LDA) process, referred to as the [Pachinko Allocation Method](https://people.cs.umass.edu/%7Emccallum/papers/pam-icml06.pdf) (PAM).\nThis method uses a two-stage LDA process, where each client has an associated\nmultinomial distribution over the coarse labels of CIFAR-100, and a\ncoarse-to-fine label multinomial distribution for that coarse label over the\nlabels under that coarse label. The coarse label multinomial is drawn from a\nsymmetric Dirichlet with parameter 0.1, and each coarse-to-fine multinomial\ndistribution is drawn from a symmetric Dirichlet with parameter 10. Each\nclient has 100 samples. To generate a sample for the client, we first select\na coarse label by drawing from the coarse label multinomial distribution, and\nthen draw a fine label using the coarse-to-fine multinomial distribution. We\nthen randomly draw a sample from CIFAR-100 with that label (without\nreplacement). If this exhausts the set of samples with this label, we\nremove the label from the coarse-to-fine multinomial and renormalize the\nmultinomial distribution.\n\n#### Data set sizes:\n\n- train: 50,000 examples\n- test: 10,000 examples\n\nThe `tf.data.Datasets` returned by\n[`tff.simulation.datasets.ClientData.create_tf_dataset_for_client`](../../../../tff/simulation/datasets/ClientData#create_tf_dataset_for_client) will yield\n`collections.OrderedDict` objects at each iteration, with the following keys\nand values, in lexicographic order by key:\n\n- `'coarse_label'`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.int64` and shape \\[1\\] that corresponds to the coarse label of the associated image. Labels are in the range \\[0-19\\].\n- `'image'`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.uint8` and shape \\[32, 32, 3\\], containing the red/blue/green pixels of the image. Each pixel is a value in the range \\[0, 255\\].\n- `'label'`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.int64` and shape \\[1\\], the class label of the corresponding image. Labels are in the range \\[0-99\\].\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------|---------------------------------------------------------------------------------------------------------|\n| `cache_dir` | (Optional) directory to cache the downloaded file. If `None`, caches in Keras' default cache directory. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| Tuple of (train, test) where the tuple elements are [`tff.simulation.datasets.ClientData`](../../../../tff/simulation/datasets/ClientData) objects. ||\n\n\u003cbr /\u003e"]]