|  View source on GitHub | 
A tff.simulation.datasets.ClientData backed by an SQL file.
Inherits From: ClientData
tff.simulation.datasets.SqlClientData(
    database_filepath: str, split_name: Optional[str] = None
)
This class expects that the SQL file has two tables: examples and
client_metadata.
Each row of the examples table corresponds to a sample in the dataset.
This table must contain at least the following three columns:
- split_name:- TEXTcolumn used to split test, holdout, and training examples.
- client_id:- TEXTcolumn identifying which user the example belongs to.
- serialized_example_proto: A serialized- tf.train.Exampleprotocol buffer containing the example data.
Each row of the client_metadata table corresponds to a client in the
dataset. This table must contain at least the following three columns:
- client_id:- TEXTcolumn used to identify the client.
- split_name:- TEXTcolumn used to split test, holdout, and training examples.
- num_examples:- INTEGERcolumn containing the number of examples held by this client.
| Attributes | |
|---|---|
| client_ids | A list of string identifiers for clients in this dataset. | 
| dataset_computation | A tff.Computationaccepting a client ID, returning a dataset. | 
| element_type_structure | The element type information of the client datasets. elements returned by datasets in this  | 
| serializable_dataset_fn | A callable accepting a client ID and returning a tf.data.Dataset.Note that this callable must be traceable by TF, as it will be used in the
context of a  | 
Methods
create_tf_dataset_for_client
create_tf_dataset_for_client(
    client_id: str
)
Creates a new tf.data.Dataset containing the client training examples.
This function will create a dataset for a given client if client_id is
contained in the client_ids property of the SQLClientData. Unlike
self.serializable_dataset_fn, this method is not serializable.
| Args | |
|---|---|
| client_id | The string identifier for the desired client. | 
| Returns | |
|---|---|
| A tf.data.Datasetobject. | 
create_tf_dataset_from_all_clients
create_tf_dataset_from_all_clients(
    seed: Optional[Union[int, Sequence[int]]] = None
) -> tf.data.Dataset
Creates a new tf.data.Dataset containing all client examples.
This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
| Args | |
|---|---|
| seed | Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None. | 
| Returns | |
|---|---|
| A tf.data.Datasetobject. | 
datasets
datasets(
    limit_count: Optional[int] = None,
    seed: Optional[Union[int, Sequence[int]]] = None
) -> Iterable[tf.data.Dataset]
Yields the tf.data.Dataset for each client in random order.
This function is intended for use building a static array of client data to be provided to the top-level federated computation.
| Args | |
|---|---|
| limit_count | Optional, a maximum number of datasets to return. | 
| seed | Optional, a seed to determine the order in which clients are
processed in the joined dataset. The seed can be any nonnegative 32-bit
integer, an array of such integers, or None. | 
from_clients_and_tf_fn
@classmethodfrom_clients_and_tf_fn( client_ids: Iterable[str], serializable_dataset_fn: Callable[[str], tf.data.Dataset] ) -> 'ClientData'
Constructs a ClientData based on the given function.
| Args | |
|---|---|
| client_ids | A non-empty list of strings to use as input to create_dataset_fn. | 
| serializable_dataset_fn | A function that takes a client_id from the above
list, and returns a tf.data.Dataset. This function must be
serializable and usable within the context of atf.functionandtff.Computation. | 
| Raises | |
|---|---|
| TypeError | If serializable_dataset_fnis atff.Computation. | 
| Returns | |
|---|---|
| A ClientDataobject. | 
preprocess
preprocess(
    preprocess_fn: Callable[[tf.data.Dataset], tf.data.Dataset]
) -> 'ClientData'
Applies preprocess_fn to each client's data.
| Args | |
|---|---|
| preprocess_fn | A callable accepting a tf.data.Datasetand returning a
preprocessedtf.data.Dataset. This function must be traceable by TF. | 
| Returns | |
|---|---|
| A tff.simulation.datasets.ClientData. | 
| Raises | |
|---|---|
| IncompatiblePreprocessFnError | If preprocess_fnis atff.Computation. | 
train_test_client_split
@classmethodtrain_test_client_split( client_data: 'ClientData', num_test_clients: int, seed: Optional[Union[int, Sequence[int]]] = None ) -> tuple['ClientData', 'ClientData']
Returns a pair of (train, test) ClientData.
This method partitions the clients of client_data into two ClientData
objects with disjoint sets of ClientData.client_ids. All clients in the
test ClientData are guaranteed to have non-empty datasets, but the
training ClientData may have clients with no data.
| Args | |
|---|---|
| client_data | The base ClientDatato split. | 
| num_test_clients | How many clients to hold out for testing. This can be at
most len(client_data.client_ids) - 1, since we don't want to produce
empty ClientData. | 
| seed | Optional seed to fix shuffling of clients before splitting. The seed
can be any nonnegative 32-bit integer, an array of such integers, or None. | 
| Returns | |
|---|---|
| A pair (train_client_data, test_client_data), where test_client_data
has num_test_clientsselected at random, subject to the constraint they
each have at least 1 batch in their dataset. | 
| Raises | |
|---|---|
| ValueError | If num_test_clientscannot be satistifed byclient_data,
or too many clients have empty datasets. |