Each client of the base_client_data is "expanded" into some number of
pseudo-clients. A serializable function fn(x) maps datapoint x to a new
datapoint, where the constructor of fn is parameterized by the expanded
client_id. For example if the client_id "client_A" has two expansions,
"client_A-0" and "client_A-1" then make_transform_fn("client_A-0")(x) might be
the identity, while make_transform_fn("client_A-1")(x) could be a random
rotation of the image with the angle determined by a hash of the string
"client_A-1".
Args
base_client_data
A ClientData to expand.
make_transform_fn
A function to be called as
make_transform_fn(client_id), where client_id is the expanded client
id, which should return a function transform_fn that maps a datapoint
x whose element type structure correspondes to base_client_data to a
new datapoint x'. It must be traceable as a tf.function.
expand_client_id
An optional function that maps a client id of
base_client_data to a list of expanded client ids. If None, the
transformed data will have the same size and ids as the original.
reduce_client_id
An function that maps an expanded client id back to the
raw client id. Must be traceable as a tf.function. Must be specified
if and only if expand_client_id is.
Attributes
client_ids
A list of string identifiers for clients in this dataset.
dataset_computation
A tff.Computation accepting a client ID, returning a dataset.
element_type_structure
The element type information of the client datasets.
elements returned by datasets in this ClientData object.
serializable_dataset_fn
A callable accepting a client ID and returning a tf.data.Dataset.
Note that this callable must be traceable by TF, as it will be used in the
context of a tf.function.
Creates a new tf.data.Dataset containing the client training examples.
This function will create a dataset for a given client, given that
client_id is contained in the client_ids property of the ClientData.
Unlike create_dataset, this method need not be serializable.
Creates a new tf.data.Dataset containing all client examples.
This function is intended for use training centralized, non-distributed
models (num_clients=1). This can be useful as a point of comparison
against federated models.
Currently, the implementation produces a dataset that contains
all examples from a single client in order, and so generally additional
shuffling should be performed.
Args
seed
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
This function is intended for use building a static array of client data
to be provided to the top-level federated computation.
Args
limit_count
Optional, a maximum number of datasets to return.
seed
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None.
Constructs a ClientData based on the given function.
Args
client_ids
A non-empty list of strings to use as input to
create_dataset_fn.
serializable_dataset_fn
A function that takes a client_id from the above
list, and returns a tf.data.Dataset. This function must be
serializable and usable within the context of a tf.function and
tff.Computation.
This method partitions the clients of client_data into two ClientData
objects with disjoint sets of ClientData.client_ids. All clients in the
test ClientData are guaranteed to have non-empty datasets, but the
training ClientData may have clients with no data.
Args
client_data
The base ClientData to split.
num_test_clients
How many clients to hold out for testing. This can be at
most len(client_data.client_ids) - 1, since we don't want to produce
empty ClientData.
seed
Optional seed to fix shuffling of clients before splitting. The seed
can be any nonnegative 32-bit integer, an array of such integers, or
None.
Returns
A pair (train_client_data, test_client_data), where test_client_data
has num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
Raises
ValueError
If num_test_clients cannot be satistifed by client_data,
or too many clients have empty datasets.