The implementation is based on tf.data.Dataset.from_tensor_slices. This
class is intended only for constructing toy federated datasets, especially
to support simulation tests. Using this for large datasets is not
recommended, as it requires putting all client data into the underlying
TensorFlow graph (which is memory intensive).
Args
tensor_slices_dict
A dictionary keyed by client_id, where values are
lists, tuples, or dicts for passing to
tf.data.Dataset.from_tensor_slices. Note that namedtuples and attrs
classes are not explicitly supported, but a user can convert their data
from those formats to a dict, and then use this class.
Raises
ValueError
If a client with no data is found.
TypeError
If tensor_slices_dict is not a dictionary, or its value
structures are namedtuples, or its value structures are not either
strictly lists, strictly (standard, non-named) tuples, or strictly
dictionaries.
TypeError
If flattened values in tensor_slices_dict convert to different
TensorFlow data types.
Attributes
client_ids
A list of string identifiers for clients in this dataset.
dataset_computation
A tff.Computation accepting a client ID, returning a dataset.
element_type_structure
The element type information of the client datasets.
elements returned by datasets in this ClientData object.
serializable_dataset_fn
A callable accepting a client ID and returning a tf.data.Dataset.
Note that this callable must be traceable by TF, as it will be used in the
context of a tf.function.
Creates a new tf.data.Dataset containing the client training examples.
This function will create a dataset for a given client, given that
client_id is contained in the client_ids property of the ClientData.
Unlike create_dataset, this method need not be serializable.
Creates a new tf.data.Dataset containing all client examples.
This function is intended for use training centralized, non-distributed
models (num_clients=1). This can be useful as a point of comparison
against federated models.
Currently, the implementation produces a dataset that contains
all examples from a single client in order, and so generally additional
shuffling should be performed.
Args
seed
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
This function is intended for use building a static array of client data
to be provided to the top-level federated computation.
Args
limit_count
Optional, a maximum number of datasets to return.
seed
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
Constructs a ClientData based on the given function.
Args
client_ids
A non-empty list of strings to use as input to
create_dataset_fn.
serializable_dataset_fn
A function that takes a client_id from the above
list, and returns a tf.data.Dataset. This function must be
serializable and usable within the context of a tf.function and
tff.Computation.
This method partitions the clients of client_data into two ClientData
objects with disjoint sets of ClientData.client_ids. All clients in the
test ClientData are guaranteed to have non-empty datasets, but the
training ClientData may have clients with no data.
Args
client_data
The base ClientData to split.
num_test_clients
How many clients to hold out for testing. This can be at
most len(client_data.client_ids) - 1, since we don't want to produce
empty ClientData.
seed
Optional seed to fix shuffling of clients before splitting. The seed
can be any nonnegative 32-bit integer, an array of such integers, or
None.
Returns
A pair (train_client_data, test_client_data), where test_client_data
has num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
Raises
ValueError
If num_test_clients cannot be satistifed by client_data,
or too many clients have empty datasets.