![]() |
"A tff.simulation.datasets.ClientData
intended for test purposes.
Inherits From: ClientData
tff.simulation.datasets.TestClientData(
tensor_slices_dict
)
The implementation is based on tf.data.Dataset.from_tensor_slices.
This
class is intended only for constructing toy federated datasets, especially
to support simulation tests. Using this for large datasets is not
recommended, as it requires putting all client data into the underlying
TensorFlow graph (which is memory intensive).
Args | |
---|---|
tensor_slices_dict
|
A dictionary keyed by client_id, where values are
lists, tuples, or dicts for passing to
tf.data.Dataset.from_tensor_slices . Note that namedtuples and attrs
classes are not explicitly supported, but a user can convert their data
from those formats to a dict, and then use this class.
|
Attributes | |
---|---|
client_ids
|
A list of string identifiers for clients in this dataset. |
dataset_computation
|
A tff.Computation accepting a client ID, returning a dataset.
|
element_type_structure
|
The element type information of the client datasets.
elements returned by datasets in this |
serializable_dataset_fn
|
A callable accepting a client ID and returning a tf.data.Dataset .
Note that this callable must be traceable by TF, as it will be used in the
context of a |
Methods
create_tf_dataset_for_client
create_tf_dataset_for_client(
client_id
)
Creates a new tf.data.Dataset
containing the client training examples.
This function will create a dataset for a given client, given that
client_id
is contained in the client_ids
property of the ClientData
.
Unlike create_dataset
, this method need not be serializable.
Args | |
---|---|
client_id
|
The string client_id for the desired client. |
Returns | |
---|---|
A tf.data.Dataset object.
|
create_tf_dataset_from_all_clients
create_tf_dataset_from_all_clients(
seed: Optional[Union[int, Sequence[int]]] = None
) -> tf.data.Dataset
Creates a new tf.data.Dataset
containing all client examples.
This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
Args | |
---|---|
seed
|
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None .
|
Returns | |
---|---|
A tf.data.Dataset object.
|
datasets
datasets(
limit_count: Optional[int] = None,
seed: Optional[Union[int, Sequence[int]]] = None
) -> Iterable[tf.data.Dataset]
Yields the tf.data.Dataset
for each client in random order.
This function is intended for use building a static array of client data to be provided to the top-level federated computation.
Args | |
---|---|
limit_count
|
Optional, a maximum number of datasets to return. |
seed
|
Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None .
|
from_clients_and_tf_fn
@classmethod
from_clients_and_tf_fn( client_ids: Iterable[str], serializable_dataset_fn: Callable[[str], tf.data.Dataset] ) -> 'ClientData'
Constructs a ClientData
based on the given function.
Args | |
---|---|
client_ids
|
A non-empty list of strings to use as input to
create_dataset_fn .
|
serializable_dataset_fn
|
A function that takes a client_id from the above
list, and returns a tf.data.Dataset . This function must be
serializable and usable within the context of a tf.function and
tff.Computation .
|
Returns | |
---|---|
A ClientData object.
|
preprocess
preprocess(
preprocess_fn: Callable[[tf.data.Dataset], tf.data.Dataset]
) -> 'ClientData'
Applies preprocess_fn
to each client's data.
Args | |
---|---|
preprocess_fn
|
A callable accepting a tf.data.Dataset and returning a
preprocessed tf.data.Dataset . This function must be traceable by TF.
|
Returns | |
---|---|
A tff.simulation.datasets.ClientData .
|
Raises | |
---|---|
IncompatiblePreprocessFnError
|
If preprocess_fn is a tff.Computation .
|
train_test_client_split
@classmethod
train_test_client_split( client_data: 'ClientData', num_test_clients: int, seed: Optional[Union[int, Sequence[int]]] = None ) -> Tuple['ClientData', 'ClientData']
Returns a pair of (train, test) ClientData
.
This method partitions the clients of client_data
into two ClientData
objects with disjoint sets of ClientData.client_ids
. All clients in the
test ClientData
are guaranteed to have non-empty datasets, but the
training ClientData
may have clients with no data.
Args | |
---|---|
client_data
|
The base ClientData to split.
|
num_test_clients
|
How many clients to hold out for testing. This can be at
most len(client_data.client_ids) - 1, since we don't want to produce
empty ClientData .
|
seed
|
Optional seed to fix shuffling of clients before splitting. The seed
can be any nonnegative 32-bit integer, an array of such integers, or
None .
|
Returns | |
---|---|
A pair (train_client_data, test_client_data), where test_client_data
has num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
|
Raises | |
---|---|
ValueError
|
If num_test_clients cannot be satistifed by client_data ,
or too many clients have empty datasets.
|