Using TFF for Federated Learning Research


TFF is an extensible, powerful framework for conducting federated learning (FL) research by simulating federated computations on realistic proxy datasets. This page describes the main concepts and components that are relevant for research simulations, as well as detailed guidance for conducting different kinds of research in TFF.

The typical structure of research code in TFF

A research FL simulation implemented in TFF typically consists of three main types of logic.

  1. Individual pieces of TensorFlow code, typically tf.functions, that encapsulate logic that runs in a single location (e.g., on clients or on a server). This code is typically written and tested without any tff.* references, and can be re-used outside of TFF. For example, the client training loop in Federated Averaging is implemented at this level.

  2. TensorFlow Federated orchestration logic, which binds together the individual tf.functions from 1. by wrapping them as tff.tensorflow.computations and then orchestrating them using abstractions like tff.federated_broadcast and tff.federated_mean inside a tff.federated_computation. See, for example, this orchestration for Federated Averaging.

  3. An outer driver script that simulates the control logic of a production FL system, selecting simulated clients from a dataset and then executing federated computations defined in 2. on those clients. For example, a Federated EMNIST experiment driver.

Federated learning datasets

TensorFlow federated hosts multiple datasets that are representative of the characteristics of real-world problems that could be solved with federated learning.

Datasets include:

  • StackOverflow. A realistic text dataset for language modeling or supervised learning tasks, with 342,477 unique users with 135,818,730 examples (sentences) in the training set.

  • Federated EMNIST. A federated pre-processing of the EMNIST character and digit dataset, where each client corresponds to a different writer. The full train set contains 3400 users with 671,585 examples from 62 labels.

  • Shakespeare. A smaller char-level text dataset based on the complete works of William Shakespeare. The data set consists of 715 users (characters of Shakespeare plays), where each example corresponds to a contiguous set of lines spoken by the character in a given play.

  • CIFAR-100. A federated partitioning of the CIFAR-100 dataset across 500 training clients and 100 test clients. Each client has 100 unique examples. The partitioning is done in a way to create more realistic heterogeneity between clients. For more details, see the API.

  • Google Landmark v2 dataset The dataset consists of photos of various world landmarks, with images grouped by photographer to achieve a federated partitioning of the data. Two flavors of dataset are available: a smaller dataset with 233 clients and 23080 images, and a larger dataset with 1262 clients and 164172 images.

  • CelebA A dataset of examples (image and facial attributes) of celebrity faces. The federated dataset has each celebrity's examples grouped together to form a client. There are 9343 clients, each with at least 5 examples. The dataset can be split into train and test groups either by clients or by examples.

  • iNaturalist A dataset consists of photos of various species. The dataset contains 120,300 images for 1,203 species. Seven flavors of the dataset are available. One of them is grouped by the photographer and it consists of 9257 clients. The rest of the datasets are grouped by the geo location where the photo was taken. These six flavors of the dataset consists of 11 - 3,606 clients.

High performance simulations

While the wall-clock time of an FL simulation is not a relevant metric for evaluating algorithms (as simulation hardware isn't representative of real FL deployment environments), being able to run FL simulations quickly is critical for research productivity. Hence, TFF has invested heavily in providing high-performance single and multi-machine runtimes. Documentation is under development, but for now see the instructions on TFF simulations with accelerators, and instructions on setting up simulations with TFF on GCP. The high-performance TFF runtime is enabled by default.

TFF for different research areas

Federated optimization algorithms

Research on federated optimization algorithms can be done in different ways in TFF, depending on the desired level of customization.

A minimal stand-alone implementation of the Federated Averaging algorithm is provided here. The code includes TF functions for local computation, TFF computations for orchestration, and a driver script on the EMNIST dataset as an example. These files can easily be adapted for customized applciations and algorithmic changes following detailed instructions in the README.

A more general implementation of Federated Averaging can be found here. This implementation allows for more sophisticated optimization techniques, including the use of different optimizers on both the server and client. Other federated learning algorithms, including federated k-means clustering, can be found here.

Model update compression

Lossy compression of model updates can lead to reduced communication costs, which in turn can lead to reduced overall training time.

To reproduce a recent paper, see this research project. To implement a custom compression algorithm, see comparison_methods in the project for baselines as an example, and TFF Aggregators tutorial if not already familiar with.

Differential privacy

TFF is interoperable with the TensorFlow Privacy library to enable research in new algorithms for federated training of models with differential privacy. For an example of training with DP using the basic DP-FedAvg algorithm and extensions, see this experiment driver.

If you want to implement a custom DP algorithm and apply it to the aggregate updates of federated averaging, you can implement a new DP mean algorithm as a subclass of tensorflow_privacy.DPQuery and create a tff.aggregators.DifferentiallyPrivateFactory with an instance of your query. An example of implementing the DP-FTRL algorithm can be found here

Federated GANs (described below) are another example of a TFF project implementing user-level differential privacy (e.g., here in code).

Robustness and attacks

TFF can also be used to simulate the targeted attacks on federated learning systems and differential privacy based defenses considered in Can You Really Back door Federated Learning?. This is done by building an iterative process with potentially malicious clients (see build_federated_averaging_process_attacked). The targeted_attack directory contains more details.

  • New attacking algorithms can be implemented by writing a client update function which is a Tensorflow function, see ClientProjectBoost for an example.
  • New defenses can be implemented by customizing 'tff.utils.StatefulAggregateFn' which aggregates client outputs to get a global update.

For an example script for simulation, see

Generative Adversarial Networks

GANs make for an interesting federated orchestration pattern that looks a little different than standard Federated Averaging. They involve two distinct networks (the generator and the discriminator) each trained with their own optimization step.

TFF can be used for research on federated training of GANs. For example, the DP-FedAvg-GAN algorithm presented in recent work is implemented in TFF. This work demonstrates the effectiveness of combining federated learning, generative models, and differential privacy.


Personalization in the setting of federated learning is an active research area. The goal of personalization is to provide different inference models to different users. There are potentially different approaches to this problem.

One approach is to let each client fine-tune a single global model (trained using federated learning) with their local data. This approach has connections to meta-learning, see, e.g., this paper. An example of this approach is given in To explore and compare different personalization strategies, you can:

  • Define a personalization strategy by implementing a tf.function that starts from an initial model, trains and evaluates a personalized model using each client's local datasets. An example is given by build_personalize_fn.

  • Define an OrderedDict that maps strategy names to the corresponding personalization strategies, and use it as the personalize_fn_dict argument in tff.learning.build_personalization_eval_computation.

Another approach is to avoid training a fully global model by training part of a model entirely locally. An instantiation of this approach is described in this blog post. This approach is also connected to meta learning, see this paper. To explore partially local federated learning, you can: