Creates a new tf.data.Dataset containing the client training examples.
This function will create a dataset for a given client if client_id is
contained in the client_ids property of the FilePerUserClientData.
Unlike self.serializable_dataset_fn, this method is not serializable.
Creates a new tf.data.Dataset containing all client examples.
This function is intended for use training centralized, non-distributed
models (num_clients=1). This can be useful as a point of comparison
against federated models.
Currently, the implementation produces a dataset that contains
all examples from a single client in order, and so generally additional
shuffling should be performed.
Args
seed
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
This function is intended for use building a static array of client data
to be provided to the top-level federated computation.
Args
limit_count
Optional, a maximum number of datasets to return.
seed
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
Constructs a ClientData based on the given function.
Args
client_ids
A non-empty list of strings to use as input to
create_dataset_fn.
serializable_dataset_fn
A function that takes a client_id from the above
list, and returns a tf.data.Dataset. This function must be
serializable and usable within the context of a tf.function and
tff.Computation.
This method partitions the clients of client_data into two ClientData
objects with disjoint sets of ClientData.client_ids. All clients in the
test ClientData are guaranteed to have non-empty datasets, but the
training ClientData may have clients with no data.
Args
client_data
The base ClientData to split.
num_test_clients
How many clients to hold out for testing. This can be at
most len(client_data.client_ids) - 1, since we don't want to produce
empty ClientData.
seed
Optional seed to fix shuffling of clients before splitting. The seed
can be any nonnegative 32-bit integer, an array of such integers, or
None.
Returns
A pair (train_client_data, test_client_data), where test_client_data
has num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
Raises
ValueError
If num_test_clients cannot be satistifed by client_data,
or too many clients have empty datasets.