View on TensorFlow.org | Run in Google Colab | View source on GitHub | Download notebook |
This tutorial is the first part of a two-part series that demonstrates how to implement custom types of federated algorithms in TensorFlow Federated (TFF) using the Federated Core (FC) - a set of lower-level interfaces that serve as a foundation upon which we have implemented the Federated Learning (FL) layer.
This first part is more conceptual; we introduce some of the key concepts and
programming abstractions used in TFF, and we demonstrate their use on a very
simple example with a distributed array of temperature sensors. In
the second part of this series, we use
the mechanisms we introduce here to implement a simple version of federated
training and evaluation algorithms. As a follow-up, we encourage you to study
the implementation
of federated averaging in tff.learning
.
By the end of this series, you should be able to recognize that the applications of Federated Core are not necessarily limited to learning. The programming abstractions we offer are quite generic, and could be used, e.g., to implement analytics and other custom types of computations over distributed data.
Although this tutorial is designed to be self-contained, we encourage you to
first read tutorials on
image classification and
text generation for a
higher-level and more gentle introduction to the TensorFlow Federated framework
and the Federated Learning APIs (tff.learning
), as
it will help you put the concepts we describe here in context.
Intended Uses
In a nutshell, Federated Core (FC) is a development environment that makes it possible to compactly express program logic that combines TensorFlow code with distributed communication operators, such as those that are used in Federated Averaging - computing distributed sums, averages, and other types of distributed aggregations over a set of client devices in the system, broadcasting models and parameters to those devices, etc.
You may be aware of
tf.contrib.distribute
,
and a natural question to ask at this point may be: in what ways does this
framework differ? Both frameworks attempt at making TensorFlow computations
distributed, after all.
One way to think about it is that, whereas the stated goal of
tf.contrib.distribute
is to allow users to use existing models and training
code with minimal changes to enable distributed training, and much focus is on
how to take advantage of distributed infrastructure to make existing training
code more efficient, the goal of TFF's Federated Core is to give researchers and
practitioners explicit control over the specific patterns of distributed
communication they will use in their systems. The focus in FC is on providing a
flexible and extensible language for expressing distributed data flow
algorithms, rather than a concrete set of implemented distributed training
capabilities.
One of the primary target audiences for TFF's FC API is researchers and practitioners who might want to experiment with new federated learning algorithms and evaluate the consequences of subtle design choices that affect the manner in which the flow of data in the distributed system is orchestrated, yet without getting bogged down by system implementation details. The level of abstraction that FC API is aiming for roughly corresponds to pseudocode one could use to describe the mechanics of a federated learning algorithm in a research publication - what data exists in the system and how it is transformed, but without dropping to the level of individual point-to-point network message exchanges.
TFF as a whole is targeting scenarios in which data is distributed, and must remain such, e.g., for privacy reasons, and where collecting all data at a centralized location may not be a viable option. This has implication on the implementation of machine learning algorithms that require an increased degree of explicit control, as compared to scenarios in which all data can be accumulated in a centralized location at a data center.
Before we start
Before we dive into the code, please try to run the following "Hello World" example to make sure your environment is correctly setup. If it doesn't work, please refer to the Installation guide for instructions.
pip install --quiet --upgrade tensorflow-federated
import collections
import numpy as np
import tensorflow as tf
import tensorflow_federated as tff
@tff.federated_computation
def hello_world():
return 'Hello, World!'
hello_world()
b'Hello, World!'
Federated data
One of the distinguishing features of TFF is that it allows you to compactly express TensorFlow-based computations on federated data. We will be using the term federated data in this tutorial to refer to a collection of data items hosted across a group of devices in a distributed system. For example, applications running on mobile devices may collect data and store it locally, without uploading to a centralized location. Or, an array of distributed sensors may collect and store temperature readings at their locations.
Federated data like those in the above examples are treated in TFF as first-class citizens, i.e., they may appear as parameters and results of functions, and they have types. To reinforce this notion, we will refer to federated data sets as federated values, or as values of federated types.
The important point to understand is that we are modeling the entire collection of data items across all devices (e.g., the entire collection temperature readings from all sensors in a distributed array) as a single federated value.
For example, here's how one would define in TFF the type of a federated float hosted by a group of client devices. A collection of temperature readings that materialize across an array of distributed sensors could be modeled as a value of this federated type.
federated_float_on_clients = tff.FederatedType(np.float32, tff.CLIENTS)
More generally, a federated type in TFF is defined by specifying the type T
of
its member constituents - the items of data that reside on individual devices,
and the group G
of devices on which federated values of this type are hosted
(plus a third, optional bit of information we'll mention shortly). We refer to
the group G
of devices hosting a federated value as the value's placement.
Thus, tff.CLIENTS
is an example of a placement.
str(federated_float_on_clients.member)
'float32'
str(federated_float_on_clients.placement)
'CLIENTS'
A federated type with member constituents T
and placement G
can be
represented compactly as {T}@G
, as shown below.
str(federated_float_on_clients)
'{float32}@CLIENTS'
The curly braces {}
in this concise notation serve as a reminder that the
member constituents (items of data on different devices) may differ, as you
would expect e.g., of temperature sensor readings, so the clients as a group are
jointly hosting a multi-set of
T
-typed items that together constitute the federated value.
It is important to note that the member constituents of a federated value are
generally opaque to the programmer, i.e., a federated value should not be
thought of as a simple dict
keyed by an identifier of a device in the system -
these values are intended to be collectively transformed only by federated
operators that abstractly represent various kinds of distributed communication
protocols (such as aggregation). If this sounds too abstract, don't worry - we
will return to this shortly, and we will illustrate it with concrete examples.
Federated types in TFF come in two flavors: those where the member constituents
of a federated value may differ (as just seen above), and those where they are
known to be all equal. This is controlled by the third, optional all_equal
parameter in the tff.FederatedType
constructor (defaulting to False
).
federated_float_on_clients.all_equal
False
A federated type with a placement G
in which all of the T
-typed member
constituents are known to be equal can be compactly represented as T@G
(as
opposed to {T}@G
, that is, with the curly braces dropped to reflect the fact
that the multi-set of member constituents consists of a single item).
str(tff.FederatedType(np.float32, tff.CLIENTS, all_equal=True))
'float32@CLIENTS'
One example of a federated value of such type that might arise in practical scenarios is a hyperparameter (such as a learning rate, a clipping norm, etc.) that has been broadcasted by a server to a group of devices that participate in federated training.
Another example is a set of parameters for a machine learning model pre-trained at the server, that were then broadcasted to a group of client devices, where they can be personalized for each user.
For example, suppose we have a pair of float32
parameters a
and b
for a
simple one-dimensional linear regression model. We can construct the
(non-federated) type of such models for use in TFF as follows. The angle braces
<>
in the printed type string are a compact TFF notation for named or unnamed
tuples.
simple_regression_model_type = (
tff.StructType([('a', np.float32), ('b', np.float32)]))
str(simple_regression_model_type)
'<a=float32,b=float32>'
Note that we are only specifying dtype
s above. Non-scalar types are also
supported. In the above code, np.float32
is a shortcut notation for the more
general tff.TensorType(np.float32, [])
.
When this model is broadcasted to clients, the type of the resulting federated value can be represented as shown below.
str(tff.FederatedType(
simple_regression_model_type, tff.CLIENTS, all_equal=True))
'<a=float32,b=float32>@CLIENTS'
Per symmetry with federated float above, we will refer to such a type as a federated tuple. More generally, we'll often use the term federated XYZ to refer to a federated value in which member constituents are XYZ-like. Thus, we will talk about things like federated tuples, federated sequences, federated models, and so on.
Now, coming back to float32@CLIENTS
- while it appears replicated across
multiple devices, it is actually a single float32
, since all member are the
same. In general, you may think of any all-equal federated type, i.e., one of
the form T@G
, as isomorphic to a non-federated type T
, since in both cases,
there's actually only a single (albeit potentially replicated) item of type T
.
Given the isomorphism between T
and T@G
, you may wonder what purpose, if
any, the latter types might serve. Read on.
Placements
Design Overview
In the preceding section, we've introduced the concept of placements - groups
of system participants that might be jointly hosting a federated value, and
we've demonstrated the use of tff.CLIENTS
as an example specification of a
placement.
To explain why the notion of a placement is so fundamental that we needed to incorporate it into the TFF type system, recall what we mentioned at the beginning of this tutorial about some of the intended uses of TFF.
Although in this tutorial, you will only see TFF code being executed locally in a simulated environment, our goal is for TFF to enable writing code that you could deploy for execution on groups of physical devices in a distributed system, potentially including mobile or embedded devices running Android. Each of of those devices would receive a separate set of instructions to execute locally, depending on the role it plays in the system (an end-user device, a centralized coordinator, an intermediate layer in a multi-tier architecture, etc.). It is important to be able to reason about which subsets of devices execute what code, and where different portions of the data might physically materialize.
This is especially important when dealing with, e.g., application data on mobile devices. Since the data is private and can be sensitive, we need the ability to statically verify that this data will never leave the device (and prove facts about how the data is being processed). The placement specifications are one of the mechanisms designed to support this.
TFF has been designed as a data-centric programming environment, and as such, unlike some of the existing frameworks that focus on operations and where those operations might run, TFF focuses on data, where that data materializes, and how it's being transformed. Consequently, placement is modeled as a property of data in TFF, rather than as a property of operations on data. Indeed, as you're about to see in the next section, some of the TFF operations span across locations, and run "in the network", so to speak, rather than being executed by a single machine or a group of machines.
Representing the type of a certain value as T@G
or {T}@G
(as opposed to just
T
) makes data placement decisions explicit, and together with a static
analysis of programs written in TFF, it can serve as a foundation for providing
formal privacy guarantees for sensitive on-device data.
An important thing to note at this point, however, is that while we encourage TFF users to be explicit about groups of participating devices that host the data (the placements), the programmer will never deal with the raw data or identities of the individual participants.
Within the body of TFF code, by design, there's no way to enumerate the devices
that constitute the group represented by tff.CLIENTS
, or to probe for the
existence of a specific device in the group. There's no concept of a device or
client identity anywhere in the Federated Core API, the underlying set of
architectural abstractions, or the core runtime infrastructure we provide to
support simulations. All the computation logic you write will be expressed as
operations on the entire client group.
Recall here what we mentioned earlier about values of federated types being
unlike Python dict
, in that one cannot simply enumerate their member
constituents. Think of values that your TFF program logic manipulates as being
associated with placements (groups), rather than with individual participants.
Placements are designed to be a first-class citizen in TFF as well, and can
appear as parameters and results of a placement
type (to be represented by
tff.PlacementType
in the API). In the future, we plan to provide a variety of
operators to transform or combine placements, but this is outside the scope of
this tutorial. For now, it suffices to think of placement
as an opaque
primitive built-in type in TFF, similar to how int
and bool
are opaque
built-in types in Python, with tff.CLIENTS
being a constant literal of this
type, not unlike 1
being a constant literal of type int
.
Specifying Placements
TFF provides two basic placement literals, tff.CLIENTS
and tff.SERVER
, to
make it easy to express the rich variety of practical scenarios that are
naturally modeled as client-server architectures, with multiple client devices
(mobile phones, embedded devices, distributed databases, sensors, etc.)
orchestrated by a single centralized server coordinator. TFF is designed to
also support custom placements, multiple client groups, multi-tiered and other,
more general distributed architectures, but discussing them is outside the scope
of this tutorial.
TFF doesn't prescribe what either the tff.CLIENTS
or the tff.SERVER
actually
represent.
In particular, tff.SERVER
may be a single physical device (a member of a
singleton group), but it might just as well be a group of replicas in a
fault-tolerant cluster running state machine replication - we do not make any
special architectural assumptions. Rather, we use the all_equal
bit mentioned
in the preceding section to express the fact that we're generally dealing with
only a single item of data at the server.
Likewise, tff.CLIENTS
in some applications might represent all clients in the
system - what in the context of federated learning we sometimes refer to as the
population, but e.g., in
production implementations of Federated Averaging,
it may represent a cohort - a subset of the clients selected for paticipation
in a particular round of training. The abstractly defined placements are given
concrete meaning when a computation in which they appear is deployed for
execution (or simply invoked like a Python function in a simulated environment,
as is demonstrated in this tutorial). In our local simulations, the group of
clients is determined by the federated data supplied as input.
Federated computations
Declaring federated computations
TFF is designed as a strongly-typed functional programming environment that supports modular development.
The basic unit of composition in TFF is a federated computation - a section of logic that may accept federated values as input and return federated values as output. Here's how you can define a computation that calculates the average of the temperatures reported by the sensor array from our previous example.
@tff.federated_computation(tff.FederatedType(np.float32, tff.CLIENTS))
def get_average_temperature(sensor_readings):
return tff.federated_mean(sensor_readings)
Looking at the above code, at this point you might be asking - aren't there
already decorator constructs to define composable units such as
tf.function
in TensorFlow, and if so, why introduce yet another one, and how is it
different?
The short answer is that the code generated by the tff.federated_computation
wrapper is neither TensorFlow, nor is it Python - it's a specification of a
distributed system in an internal platform-independent glue language. At this
point, this will undoubtedly sound cryptic, but please bear this intuitive
interpretation of a federated computation as an abstract specification of a
distributed system in mind. We'll explain it in a minute.
First, let's play with the definition a bit. TFF computations are generally
modeled as functions - with or without parameters, but with well-defined type
signatures. You can print the type signature of a computation by querying its
type_signature
property, as shown below.
str(get_average_temperature.type_signature)
'({float32}@CLIENTS -> float32@SERVER)'
The type signature tells us that the computation accepts a collection of different sensor readings on client devices, and returns a single average on the server.
Before we go any further, let's reflect on this for a minute - the input and
output of this computation are in different places (on CLIENTS
vs. at the
SERVER
). Recall what we said in the preceding section on placements about how
TFF operations may span across locations, and run in the network, and what we
just said about federated computations as representing abstract specifications
of distributed systems. We have just a defined one such computation - a simple
distributed system in which data is consumed at client devices, and the
aggregate results emerge at the server.
In many practical scenarios, the computations that represent top-level tasks will tend to accept their inputs and report their outputs at the server - this reflects the idea that computations might be triggered by queries that originate and terminate on the server.
However, FC API does not impose this assumption, and many of the building blocks
we use internally (including numerous tff.federated_...
operators you may find
in the API) have inputs and outputs with distinct placements, so in general, you
should not think about a federated computation as something that runs on the
server or is executed by a server. The server is just one type of participant
in a federated computation. In thinking about the mechanics of such
computations, it's best to always default to the global network-wide
perspective, rather than the perspective of a single centralized coordinator.
In general, functional type signatures are compactly represented as (T -> U)
for types T
and U
of inputs and outputs, respectively. The type of the
formal parameter (such sensor_readings
in this case) is specified as the
argument to the decorator. You don't need to specify the type of the result -
it's determined automatically.
Although TFF does offer limited forms of polymorphism, programmers are strongly encouraged to be explicit about the types of data they work with, as that makes understanding, debugging, and formally verifying properties of your code easier. In some cases, explicitly specifying types is a requirement (e.g., polymorphic computations are currently not directly executable).
Executing federated computations
In order to support development and debugging, TFF allows you to directly invoke
computations defined this way as Python functions, as shown below. Where the
computation expects a value of a federated type with the all_equal
bit set to
False
, you can feed it as a plain list
in Python, and for federated types
with the all_equal
bit set to True
, you can just directly feed the (single)
member constituent. This is also how the results are reported back to you.
get_average_temperature([68.5, 70.3, 69.8])
69.53334
When running computations like this in simulation mode, you act as an external observer with a system-wide view, who has the ability to supply inputs and consume outputs at any locations in the network, as indeed is the case here - you supplied client values at input, and consumed the server result.
Now, let's return to a note we made earlier about the
tff.federated_computation
decorator emitting code in a glue language.
Although the logic of TFF computations can be expressed as ordinary functions in
Python (you just need to decorate them with tff.federated_computation
as we've
done above), and you can directly invoke them with Python arguments just
like any other Python functions in this notebook, behind the scenes, as we noted
earlier, TFF computations are actually not Python.
What we mean by this is that when the Python interpreter encounters a function
decorated with tff.federated_computation
, it traces the statements in this
function's body once (at definition time), and then constructs a
serialized representation
of the computation's logic for future use - whether for execution, or to be
incorporated as a sub-component into another computation.
You can verify this by adding a print statement, as follows:
@tff.federated_computation(tff.FederatedType(np.float32, tff.CLIENTS))
def get_average_temperature(sensor_readings):
print ('Getting traced, the argument is "{}".'.format(
type(sensor_readings).__name__))
return tff.federated_mean(sensor_readings)
Getting traced, the argument is "Value".
You can think of Python code that defines a federated computation similarly to how you would think of Python code that builds a TensorFlow graph in a non-eager context (if you're not familiar with the non-eager uses of TensorFlow, think of your Python code defining a graph of operations to be executed later, but not actually running them on the fly). The non-eager graph-building code in TensorFlow is Python, but the TensorFlow graph constructed by this code is platform-independent and serializable.
Likewise, TFF computations are defined in Python, but the Python statements in
their bodies, such as tff.federated_mean
in the example weve just shown,
are compiled into a portable and platform-independent serializable
representation under the hood.
As a developer, you don't need to concern yourself with the details of this
representation, as you will never need to directly work with it, but you should
be aware of its existence, the fact that TFF computations are fundamentally
non-eager, and cannot capture arbitrary Python state. Python code contained in a
TFF computation's body is executed at definition time, when the body of the
Python function decorated with tff.federated_computation
is traced before
getting serialized. It's not retraced again at invocation time (except when the
function is polymorphic; please refer to the documentation pages for details).
You may wonder why we've chosen to introduce a dedicated internal non-Python representation. One reason is that ultimately, TFF computations are intended to be deployable to real physical environments, and hosted on mobile or embedded devices, where Python may not be available.
Another reason is that TFF computations express the global behavior of
distributed systems, as opposed to Python programs which express the local
behavior of individual participants. You can see that in the simple example
above, with the special operator tff.federated_mean
that accepts data on
client devices, but deposits the results on the server.
The operator tff.federated_mean
cannot be easily modeled as an ordinary
operator in Python, since it doesn't execute locally - as noted earlier, it
represents a distributed system that coordinates the behavior of multiple system
participants. We will refer to such operators as federated operators, to
distinguish them from ordinary (local) operators in Python.
The TFF type system, and the fundamental set of operations supported in the TFF's language, thus deviates significantly from those in Python, necessitating the use of a dedicated representation.
Composing federated computations
As noted above, federated computations and their constituents are best
understood as models of distributed systems, and you can think of composing
federated computations as composing more complex distributed systems from
simpler ones. You can think of the tff.federated_mean
operator as a kind of
built-in template federated computation with a type signature ({T}@CLIENTS ->
T@SERVER)
(indeed, just like computations you write, this operator also has a
complex structure - under the hood we break it down into simpler operators).
The same is true of composing federated computations. The computation
get_average_temperature
may be invoked in a body of another Python function
decorated with tff.federated_computation
- doing so will cause it to be
embedded in the body of the parent, much in the same way tff.federated_mean
was embedded in its own body earlier.
An important restriction to be aware of is that bodies of Python functions
decorated with tff.federated_computation
must consist only of federated
operators, i.e., they cannot directly contain TensorFlow operations. For
example, you cannot directly use tf.nest
interfaces to add a pair of
federated values. TensorFlow code must be confined to blocks of code decorated
with a tff.tensorflow.computation
discussed in the following section. Only when
wrapped in this manner can the wrapped TensorFlow code be invoked in the body of
a tff.federated_computation
.
The reasons for this separation are technical (it's hard to trick operators such
as tf.add
to work with non-tensors) as well as architectural. The language of
federated computations (i.e., the logic constructed from serialized bodies of
Python functions decorated with tff.federated_computation
) is designed to
serve as a platform-independent glue language. This glue language is currently
used to build distributed systems from embedded sections of TensorFlow code
(confined to tff.tensorflow.computation
blocks). In the fullness of time, we
anticipate the need to embed sections of other, non-TensorFlow logic, such as
relational database queries that might represent input pipelines, all connected
together using the same glue language (the tff.federated_computation
blocks).
TensorFlow logic
Declaring TensorFlow computations
TFF is designed for use with TensorFlow. As such, the bulk of the code you will
write in TFF is likely to be ordinary (i.e., locally-executing) TensorFlow code.
In order to use such code with TFF, as noted above, it just needs to be
decorated with tff.tensorflow.computation
.
For example, here's how we could implement a function that takes a number and
adds 0.5
to it.
@tff.tensorflow.computation(np.float32)
def add_half(x):
return tf.add(x, 0.5)
Once again, looking at this, you may be wondering why we should define another
decorator tff.tensorflow.computation
instead of simply using an existing mechanism
such as tf.function
. Unlike in the preceding section, here we are
dealing with an ordinary block of TensorFlow code.
There are a few reasons for this, the full treatment of which goes beyond the scope of this tutorial, but it's worth naming the main one:
- In order to embed reusable building blocks implemented using TensorFlow code in the bodies of federated computations, they need to satisfy certain properties - such as getting traced and serialized at definition time, having type signatures, etc. This generally requires some form of a decorator.
In general, we recommend using TensorFlow's native mechanisms for composition,
such as tf.function
, wherever possible, as the exact manner in
which TFF's decorator interacts with eager functions can be expected to evolve.
Now, coming back to the example code snippet above, the computation add_half
we just defined can be treated by TFF just like any other TFF computation. In
particular, it has a TFF type signature.
str(add_half.type_signature)
'(float32 -> float32)'
Note this type signature does not have placements. TensorFlow computations cannot consume or return federated types.
You can now also use add_half
as a building block in other computations . For
example, here's how you can use the tff.federated_map
operator to apply
add_half
pointwise to all member constituents of a federated float on client
devices.
@tff.federated_computation(tff.FederatedType(np.float32, tff.CLIENTS))
def add_half_on_clients(x):
return tff.federated_map(add_half, x)
str(add_half_on_clients.type_signature)
'({float32}@CLIENTS -> {float32}@CLIENTS)'
Executing TensorFlow computations
Execution of computations defined with tff.tensorflow.computation
follows the same
rules as those we described for tff.federated_computation
. They can be invoked
as ordinary callables in Python, as follows.
add_half_on_clients([1.0, 3.0, 2.0])
[<tf.Tensor: shape=(), dtype=float32, numpy=1.5>, <tf.Tensor: shape=(), dtype=float32, numpy=3.5>, <tf.Tensor: shape=(), dtype=float32, numpy=2.5>]
Once again, it is worth noting that invoking the computation
add_half_on_clients
in this manner simulates a distributed process. Data is
consumed on clients, and returned on clients. Indeed, this computation has each
client perform a local action. There is no tff.SERVER
explicitly mentioned in
this system (even if in practice, orchestrating such processing might involve
one). Think of a computation defined this way as conceptually analogous to the
Map
stage in MapReduce
.
Also, keep in mind that what we said in the preceding section about TFF
computations getting serialized at the definition time remains true for
tff.tensorflow.computation
code as well - the Python body of add_half_on_clients
gets traced once at definition time. On subsequent invocations, TFF uses its
serialized representation.
The only difference between Python methods decorated with
tff.federated_computation
and those decorated with tff.tensorflow.computation
is
that the latter are serialized as TensorFlow graphs (whereas the former are not
allowed to contain TensorFlow code directly embedded in them).
Under the hood, each method decorated with tff.tensorflow.computation
temporarily
disables eager execution in order to allow the computation's structure to be
captured. While eager execution is locally disabled, you are welcome to use
eager TensorFlow, AutoGraph, TensorFlow 2.0 constructs, etc., so long as you
write the logic of your computation in a manner such that it can get correctly
serialized.
For example, the following code will fail:
try:
# Eager mode
constant_10 = tf.constant(10.)
@tff.tensorflow.computation(np.float32)
def add_ten(x):
return x + constant_10
except Exception as err:
print (err)
Attempting to capture an EagerTensor without building a function.
The above fails because constant_10
has already been constructed outside of
the graph that tff.tensorflow.computation
constructs internally in the body of
add_ten
during the serialization process.
On the other hand, invoking python functions that modify the current graph when
called inside a tff.tensorflow.computation
is fine:
def get_constant_10():
return tf.constant(10.)
@tff.tensorflow.computation(np.float32)
def add_ten(x):
return x + get_constant_10()
add_ten(5.0)
15.0
Note that the serialization mechanisms in TensorFlow are evolving, and we expect the details of how TFF serializes computations to evolve as well.
Working with tf.data.Dataset
s
As noted earlier, a unique feature of tff.tensorflow.computation
s is that they allow
you to work with tf.data.Dataset
s defined abstractly as formal parameters by
your code. Parameters to be represented in TensorFlow as data sets need to be
declared using the tff.SequenceType
constructor.
For example, the type specification tff.SequenceType(np.float32)
defines an
abstract sequence of float elements in TFF. Sequences can contain either
tensors, or complex nested structures (we'll see examples of those later). The
concise representation of a sequence of T
-typed items is T*
.
float32_sequence = tff.SequenceType(np.float32)
str(float32_sequence)
'float32*'
Suppose that in our temperature sensor example, each sensor holds not just one
temperature reading, but multiple. Here's how you can define a TFF computation
in TensorFlow that calculates the average of temperatures in a single local data
set using the tf.data.Dataset.reduce
operator.
@tff.tensorflow.computation(tff.SequenceType(np.float32))
def get_local_temperature_average(local_temperatures):
sum_and_count = (
local_temperatures.reduce((0.0, 0), lambda x, y: (x[0] + y, x[1] + 1)))
return sum_and_count[0] / tf.cast(sum_and_count[1], tf.float32)
str(get_local_temperature_average.type_signature)
'(float32* -> float32)'
In the body of a method decorated with tff.tensorflow.computation
, formal parameters
of a TFF sequence type are represented simply as objects that behave like
tf.data.Dataset
, i.e., support the same properties and methods (they are
currently not implemented as subclasses of that type - this may change as the
support for data sets in TensorFlow evolves).
You can easily verify this as follows.
@tff.tensorflow.computation(tff.SequenceType(np.int32))
def foo(x):
return x.reduce(np.int32(0), lambda x, y: x + y)
foo([1, 2, 3])
6
Keep in mind that unlike ordinary tf.data.Dataset
s, these dataset-like objects
are placeholders. They don't contain any elements, since they represent abstract
sequence-typed parameters, to be bound to concrete data when used in a concrete
context. Support for abstractly-defined placeholder data sets is still somewhat
limited at this point, and in the early days of TFF, you may encounter certain
restrictions, but we won't need to worry about them in this tutorial (please
refer to the documentation pages for details).
When locally executing a computation that accepts a sequence in a simulation
mode, such as in this tutorial, you can feed the sequence as Python list, as
below (as well as in other ways, e.g., as a tf.data.Dataset
in eager mode, but
for now, we'll keep it simple).
get_local_temperature_average([68.5, 70.3, 69.8])
69.53333
Like all other TFF types, sequences like those defined above can use the
tff.StructType
constructor to define nested structures. For example,
here's how one could declare a computation that accepts a sequence of pairs A
,
B
, and returns the sum of their products. We include the tracing statements in
the body of the computation so that you can see how the TFF type signature
translates into the dataset's output_types
and output_shapes
.
@tff.tensorflow.computation(tff.SequenceType(collections.OrderedDict([('A', np.int32), ('B', np.int32)])))
def foo(ds):
print('element_structure = {}'.format(ds.element_spec))
return ds.reduce(np.int32(0), lambda total, x: total + x['A'] * x['B'])
element_structure = OrderedDict([('A', TensorSpec(shape=(), dtype=tf.int32, name=None)), ('B', TensorSpec(shape=(), dtype=tf.int32, name=None))])
str(foo.type_signature)
'(<A=int32,B=int32>* -> int32)'
foo([{'A': 2, 'B': 3}, {'A': 4, 'B': 5}])
26
The support for using tf.data.Datasets
as formal parameters is still somewhat
limited and evolving, although functional in simple scenarios such as those used
in this tutorial.
Putting it all together
Now, let's try again to use our TensorFlow computation in a federated setting. Suppose we have a group of sensors that each have a local sequence of temperature readings. We can compute the global temperature average by averaging the sensors' local averages as follows.
@tff.federated_computation(
tff.FederatedType(tff.SequenceType(np.float32), tff.CLIENTS))
def get_global_temperature_average(sensor_readings):
return tff.federated_mean(
tff.federated_map(get_local_temperature_average, sensor_readings))
Note that this isn't a simple average across all local temperature readings from
all clients, as that would require weighing contributions from different clients
by the number of readings they locally maintain. We leave it as an exercise for
the reader to update the above code; the tff.federated_mean
operator
accepts the weight as an optional second argument (expected to be a federated
float).
Also note that the input to get_global_temperature_average
now becomes a
federated float sequence. Federated sequences is how we will typically represent
on-device data in federated learning, with sequence elements typically
representing data batches (you will see examples of this shortly).
str(get_global_temperature_average.type_signature)
'({float32*}@CLIENTS -> float32@SERVER)'
Here's how we can locally execute the computation on a sample of data in Python.
Notice that the way we supply the input is now as a list
of list
s. The outer
list iterates over the devices in the group represented by tff.CLIENTS
, and
the inner ones iterate over elements in each device's local sequence.
get_global_temperature_average([[68.0, 70.0], [71.0], [68.0, 72.0, 70.0]])
70.0
This concludes the first part of the tutorial... we encourage you to continue on to the second part.