Downloads and caches the dataset locally. If previously downloaded, tries to
load the dataset from cache.
This dataset is derived from the Leaf repository
(https://github.com/TalwalkarLab/leaf) pre-processing of the Extended MNIST
dataset, grouping examples by writer. Details about Leaf were published in
"LEAF: A Benchmark for Federated Settings" https://arxiv.org/abs/1812.01097
Data set sizes:
only_digits=True: 3,383 users, 10 label classes
train: 341,873 examples
test: 40,832 examples
only_digits=False: 3,400 users, 62 label classes
train: 671,585 examples
test: 77,483 examples
Rather than holding out specific users, each user's examples are split across
train and test so that all users have at least one example in train and
one example in test. Writers that had less than 2 examples are excluded from
the data set.
'label': a tf.Tensor with dtype=tf.int32 and shape [1], the class
label of the corresponding pixels. Labels [0-9] correspond to the digits
classes, labels [10-35] correspond to the uppercase classes (e.g., label
11 is 'B'), and labels [36-61] correspond to the lowercase classes
(e.g., label 37 is 'b').
'pixels': a tf.Tensor with dtype=tf.float32 and shape [28, 28],
containing the pixels of the handwritten digit, with values in
the range [0.0, 1.0].
Args
only_digits
(Optional) whether to only include examples that are from the
digits [0-9] classes. If False, includes lower and upper case
characters, for a total of 62 class labels.
cache_dir
(Optional) directory to cache the downloaded file. If None,
caches in Keras' default cache directory.
[null,null,["Last updated 2024-09-20 UTC."],[],[],null,["# tff.simulation.datasets.emnist.load_data\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/federated/blob/v0.87.0 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the) |\n\nLoads the Federated EMNIST dataset. \n\n tff.simulation.datasets.emnist.load_data(\n only_digits=True, cache_dir=None\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [Building Your Own Federated Learning Algorithm](https://www.tensorflow.org/federated/tutorials/building_your_own_federated_learning_algorithm) - [Composing Learning Algorithms](https://www.tensorflow.org/federated/tutorials/composing_learning_algorithms) - [Federated Learning for Image Classification](https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification) - [Differential Privacy in TFF](https://www.tensorflow.org/federated/tutorials/federated_learning_with_differential_privacy) - [High-performance simulations with TFF](https://www.tensorflow.org/federated/tutorials/simulations) |\n\nDownloads and caches the dataset locally. If previously downloaded, tries to\nload the dataset from cache.\n\nThis dataset is derived from the Leaf repository\n(\u003chttps://github.com/TalwalkarLab/leaf\u003e) pre-processing of the Extended MNIST\ndataset, grouping examples by writer. Details about Leaf were published in\n\"LEAF: A Benchmark for Federated Settings\" \u003chttps://arxiv.org/abs/1812.01097\u003e\n| **Note:** This dataset does not include some additional preprocessing that MNIST includes, such as size-normalization and centering. In the Federated EMNIST data, the value of 1.0 corresponds to the background, and 0.0 corresponds to the color of the digits themselves; this is the *inverse* of some MNIST representations, e.g. in [tensorflow_datasets](https://github.com/tensorflow/datasets/blob/master/docs/datasets.md#mnist), where 0 corresponds to the background color, and 255 represents the color of the digit.\n\n#### Data set sizes:\n\n*only_digits=True*: 3,383 users, 10 label classes\n\n- train: 341,873 examples\n- test: 40,832 examples\n\n*only_digits=False*: 3,400 users, 62 label classes\n\n- train: 671,585 examples\n- test: 77,483 examples\n\nRather than holding out specific users, each user's examples are split across\n*train* and *test* so that all users have at least one example in *train* and\none example in *test*. Writers that had less than 2 examples are excluded from\nthe data set.\n\nThe `tf.data.Datasets` returned by\n[`tff.simulation.datasets.ClientData.create_tf_dataset_for_client`](../../../../tff/simulation/datasets/ClientData#create_tf_dataset_for_client) will yield\n`collections.OrderedDict` objects at each iteration, with the following keys\nand values, in lexicographic order by key:\n\n- `'label'`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.int32` and shape \\[1\\], the class label of the corresponding pixels. Labels \\[0-9\\] correspond to the digits classes, labels \\[10-35\\] correspond to the uppercase classes (e.g., label 11 is 'B'), and labels \\[36-61\\] correspond to the lowercase classes (e.g., label 37 is 'b').\n- `'pixels'`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.float32` and shape \\[28, 28\\], containing the pixels of the handwritten digit, with values in the range \\[0.0, 1.0\\].\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `only_digits` | (Optional) whether to only include examples that are from the digits \\[0-9\\] classes. If `False`, includes lower and upper case characters, for a total of 62 class labels. |\n| `cache_dir` | (Optional) directory to cache the downloaded file. If `None`, caches in Keras' default cache directory. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| Tuple of (train, test) where the tuple elements are [`tff.simulation.datasets.ClientData`](../../../../tff/simulation/datasets/ClientData) objects. ||\n\n\u003cbr /\u003e"]]