Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge


Object to hold a federated dataset.

The federated dataset is represented as a list of client ids, and a function to look up the local dataset for each client id.

Each client's local dataset is represented as a, but generally this class (and the corresponding datasets hosted by TFF) can easily be consumed by any Python-based ML framework as numpy arrays:

import tensorflow as tf
import tensorflow_federated as tff
import tensorflow_datasets as tfds

for client_id in sampled_client_ids[:5]:
  client_local_dataset = tfds.as_numpy(
  # client_local_dataset is an iterable of structures of numpy arrays
  for example in client_local_dataset:

If desiring a manner for constructing ClientData objects for testing purposes, please see the tff.simulation.datasets.TestClientData class, as it provides an easy way to construct toy federated datasets.

client_ids A list of string identifiers for clients in this dataset.
dataset_computation A tff.Computation accepting a client ID, returning a dataset.

ClientData implementations that don't support dataset_computation should raise NotImplementedError if this attribute is accessed.

element_type_structure The element type information of the client datasets.

elements returned by datasets in this ClientData object.



View source

Creates a new containing the client training examples.

client_id The string client_id for the desired client.

A object.


View source

Creates a new containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.

A object.


View source

Yields the for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

limit_count Optional, a maximum number of datasets to return.
seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.


View source

Constructs a ClientData based on the given function.

client_ids A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
create_tf_dataset_for_client_fn A function that takes a client_id from the above list, and returns a If this function is additionally a tff.Computation, the constructed ClientData will expose a dataset_computation attribute which can be used for high-performance distributed simulations.

A ClientData.


View source

Applies preprocess_fn to each client's data.


View source

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

client_data The base ClientData to split.
num_test_clients How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.
seed Optional seed to fix shuffling of clients before splitting.

A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.

ValueError If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.