tff.simulation.datasets.shakespeare.load_data
Stay organized with collections
Save and categorize content based on your preferences.
Loads the federated Shakespeare dataset.
tff.simulation.datasets.shakespeare.load_data(
cache_dir: Optional[str] = None
) -> tuple[client_data.ClientData, client_data.ClientData]
Used in the notebooks
Downloads and caches the dataset locally. If previously downloaded, tries to
load the dataset from cache.
This dataset is derived from the Leaf repository
(https://github.com/TalwalkarLab/leaf) pre-processing on the works of
Shakespeare, which is published in "LEAF: A Benchmark for Federated Settings"
https://arxiv.org/abs/1812.01097
The data set consists of 715 users (characters of Shakespeare plays), where
each
example corresponds to a contiguous set of lines spoken by the character in a
given play.
Data set sizes:
- train: 16,068 examples
- test: 2,356 examples
Rather than holding out specific users, each user's examples are split across
train and test so that all users have at least one example in train and
one example in test. Characters that had less than 2 examples are excluded
from the data set.
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values:
'snippets'
: a tf.Tensor
with dtype=tf.string
, the snippet of
contiguous text.
Args |
cache_dir
|
(Optional) directory to cache the downloaded file. If None ,
caches in Keras' default cache directory.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[null,null,["Last updated 2024-09-20 UTC."],[],[],null,["# tff.simulation.datasets.shakespeare.load_data\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/federated/blob/v0.87.0 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the) |\n\nLoads the federated Shakespeare dataset. \n\n tff.simulation.datasets.shakespeare.load_data(\n cache_dir: Optional[str] = None\n ) -\u003e tuple[client_data.ClientData, client_data.ClientData]\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [Federated Learning for Text Generation](https://www.tensorflow.org/federated/tutorials/federated_learning_for_text_generation) - [Private Heavy Hitters](https://www.tensorflow.org/federated/tutorials/private_heavy_hitters) |\n\nDownloads and caches the dataset locally. If previously downloaded, tries to\nload the dataset from cache.\n\nThis dataset is derived from the Leaf repository\n(\u003chttps://github.com/TalwalkarLab/leaf\u003e) pre-processing on the works of\nShakespeare, which is published in \"LEAF: A Benchmark for Federated Settings\"\n\u003chttps://arxiv.org/abs/1812.01097\u003e\n\nThe data set consists of 715 users (characters of Shakespeare plays), where\neach\nexample corresponds to a contiguous set of lines spoken by the character in a\ngiven play.\n\n#### Data set sizes:\n\n- train: 16,068 examples\n- test: 2,356 examples\n\nRather than holding out specific users, each user's examples are split across\n*train* and *test* so that all users have at least one example in *train* and\none example in *test*. Characters that had less than 2 examples are excluded\nfrom the data set.\n\nThe `tf.data.Datasets` returned by\n[`tff.simulation.datasets.ClientData.create_tf_dataset_for_client`](../../../../tff/simulation/datasets/ClientData#create_tf_dataset_for_client) will yield\n`collections.OrderedDict` objects at each iteration, with the following keys\nand values:\n\n- `'snippets'`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.string`, the snippet of contiguous text.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------|---------------------------------------------------------------------------------------------------------|\n| `cache_dir` | (Optional) directory to cache the downloaded file. If `None`, caches in Keras' default cache directory. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| Tuple of (train, test) where the tuple elements are [`tff.simulation.datasets.ClientData`](../../../../tff/simulation/datasets/ClientData) objects. ||\n\n\u003cbr /\u003e"]]