View source on GitHub
|
Object to hold a federated dataset.
The federated dataset is represented as a list of client ids, and a function to look up the local dataset for each client id.
Each client's local dataset is represented as a tf.data.Dataset, but
generally this class (and the corresponding datasets hosted by TFF) can
easily be consumed by any Python-based ML framework as numpy arrays:
import tensorflow as tf
import tensorflow_federated as tff
import tensorflow_datasets as tfds
for client_id in sampled_client_ids[:5]:
client_local_dataset = tfds.as_numpy(
emnist_train.create_tf_dataset_for_client(client_id))
# client_local_dataset is an iterable of structures of numpy arrays
for example in client_local_dataset:
print(example)
If desiring a manner for constructing ClientData objects for testing purposes,
please see the tff.simulation.datasets.TestClientData class, as it provides
an easy way to construct toy federated datasets.
Attributes | |
|---|---|
client_ids
|
A list of string identifiers for clients in this dataset. |
dataset_computation
|
A tff.Computation accepting a client ID, returning a dataset.
|
element_type_structure
|
The element type information of the client datasets.
elements returned by datasets in this |
serializable_dataset_fn
|
A callable accepting a client ID and returning a tf.data.Dataset.
Note that this callable must be traceable by TF, as it will be used in the
context of a |
Methods
create_tf_dataset_for_client
create_tf_dataset_for_client(
client_id: str
) -> tf.data.Dataset
Creates a new tf.data.Dataset containing the client training examples.
This function will create a dataset for a given client, given that
client_id is contained in the client_ids property of the ClientData.
Unlike create_dataset, this method need not be serializable.
| Args | |
|---|---|
client_id
|
The string client_id for the desired client. |
| Returns | |
|---|---|
A tf.data.Dataset object.
|
create_tf_dataset_from_all_clients
create_tf_dataset_from_all_clients(
seed: Optional[Union[int, Sequence[int]]] = None
) -> tf.data.Dataset
Creates a new tf.data.Dataset containing all client examples.
This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
| Args | |
|---|---|
seed
|
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
|
| Returns | |
|---|---|
A tf.data.Dataset object.
|
datasets
datasets(
limit_count: Optional[int] = None,
seed: Optional[Union[int, Sequence[int]]] = None
) -> Iterable[tf.data.Dataset]
Yields the tf.data.Dataset for each client in random order.
This function is intended for use building a static array of client data to be provided to the top-level federated computation.
| Args | |
|---|---|
limit_count
|
Optional, a maximum number of datasets to return. |
seed
|
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
|
from_clients_and_tf_fn
@classmethodfrom_clients_and_tf_fn( client_ids: Iterable[str], serializable_dataset_fn: Callable[[str], tf.data.Dataset] ) -> 'ClientData'
Constructs a ClientData based on the given function.
| Args | |
|---|---|
client_ids
|
A non-empty list of strings to use as input to
create_dataset_fn.
|
serializable_dataset_fn
|
A function that takes a client_id from the above
list, and returns a tf.data.Dataset. This function must be
serializable and usable within the context of a tf.function and
tff.Computation.
|
| Raises | |
|---|---|
TypeError
|
If serializable_dataset_fn is a tff.Computation.
|
| Returns | |
|---|---|
A ClientData object.
|
preprocess
preprocess(
preprocess_fn: Callable[[tf.data.Dataset], tf.data.Dataset]
) -> 'ClientData'
Applies preprocess_fn to each client's data.
| Args | |
|---|---|
preprocess_fn
|
A callable accepting a tf.data.Dataset and returning a
preprocessed tf.data.Dataset. This function must be traceable by TF.
|
| Returns | |
|---|---|
A tff.simulation.datasets.ClientData.
|
| Raises | |
|---|---|
IncompatiblePreprocessFnError
|
If preprocess_fn is a tff.Computation.
|
train_test_client_split
@classmethodtrain_test_client_split( client_data: 'ClientData', num_test_clients: int, seed: Optional[Union[int, Sequence[int]]] = None ) -> tuple['ClientData', 'ClientData']
Returns a pair of (train, test) ClientData.
This method partitions the clients of client_data into two ClientData
objects with disjoint sets of ClientData.client_ids. All clients in the
test ClientData are guaranteed to have non-empty datasets, but the
training ClientData may have clients with no data.
| Args | |
|---|---|
client_data
|
The base ClientData to split.
|
num_test_clients
|
How many clients to hold out for testing. This can be at
most len(client_data.client_ids) - 1, since we don't want to produce
empty ClientData.
|
seed
|
Optional seed to fix shuffling of clients before splitting. The seed
can be any nonnegative 32-bit integer, an array of such integers, or
None.
|
| Returns | |
|---|---|
A pair (train_client_data, test_client_data), where test_client_data
has num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
|
| Raises | |
|---|---|
ValueError
|
If num_test_clients cannot be satistifed by client_data,
or too many clients have empty datasets.
|
View source on GitHub