View source on GitHub |
Interface for differentially private query mechanisms.
Differential privacy is achieved by processing records to bound sensitivity, accumulating the processed records (usually by summing them) and then adding noise to the aggregated result. The process can be repeated to compose applications of the same mechanism, possibly with different parameters.
The DPQuery interface specifies a functional approach to this process. A global state maintains state that persists across applications of the mechanism. For each application, the following steps are performed:
- Use the global state to derive parameters to use for the next sample of records.
- Initialize a sample state that will accumulate processed records.
- For each record: a. Process the record. b. Accumulate the record into the sample state.
- Get the result of the mechanism, possibly updating the global state to use in the next application.
- Derive metrics from the global state.
Here is an example using the GaussianSumQuery. Assume there is some function records_for_round(round) that returns an iterable of records to use on some round.
dp_query = tensorflow_privacy.GaussianSumQuery(
l2_norm_clip=1.0, stddev=1.0)
global_state = dp_query.initial_global_state()
for round in range(num_rounds):
sample_params = dp_query.derive_sample_params(global_state)
sample_state = dp_query.initial_sample_state()
for record in records_for_round(round):
sample_state = dp_query.accumulate_record(
sample_params, sample_state, record)
result, global_state = dp_query.get_noised_result(
sample_state, global_state)
metrics = dp_query.derive_metrics(global_state)
# Do something with result and metrics...
Methods
accumulate_preprocessed_record
@abc.abstractmethod
accumulate_preprocessed_record( sample_state, preprocessed_record )
Accumulates a single preprocessed record into the sample state.
This method is intended to only do simple aggregation, typically just a sum. In the future, we might remove this method and replace it with a way to declaratively specify the type of aggregation required.
Args | |
---|---|
sample_state
|
The current sample state. In standard DP-SGD training, the accumulated sum of previous clipped microbatch gradients. |
preprocessed_record
|
The preprocessed record to accumulate. |
Returns | |
---|---|
The updated sample state. |
accumulate_record
accumulate_record(
params, sample_state, record
)
Accumulates a single record into the sample state.
This is a helper method that simply delegates to preprocess_record
and
accumulate_preprocessed_record
for the common case when both of those
functions run on a single device. Typically this will be a simple sum.
Args | |
---|---|
params
|
The parameters for the sample. In standard DP-SGD training, the clipping norm for the sample's microbatch gradients (i.e., a maximum norm magnitude to which each gradient is clipped) |
sample_state
|
The current sample state. In standard DP-SGD training, the accumulated sum of previous clipped microbatch gradients. |
record
|
The record to accumulate. In standard DP-SGD training, the gradient computed for the examples in one microbatch, which may be the gradient for just one example (for size 1 microbatches). |
Returns | |
---|---|
The updated sample state. In standard DP-SGD training, the set of previous microbatch gradients with the addition of the record argument. |
derive_metrics
derive_metrics(
global_state
)
Derives metric information from the current global state.
Any metrics returned should be derived only from privatized quantities.
Args | |
---|---|
global_state
|
The global state from which to derive metrics. |
Returns | |
---|---|
A collections.OrderedDict mapping string metric names to tensor values.
|
derive_sample_params
derive_sample_params(
global_state
)
Given the global state, derives parameters to use for the next sample.
For example, if the mechanism needs to clip records to bound the norm, the clipping norm should be part of the sample params. In a distributed context, this is the part of the state that would be sent to the workers so they can process records.
Args | |
---|---|
global_state
|
The current global state. |
Returns | |
---|---|
Parameters to use to process records in the next sample. |
get_noised_result
@abc.abstractmethod
get_noised_result( sample_state, global_state )
Gets the query result after all records of sample have been accumulated.
The global state can also be updated for use in the next application of the DP mechanism.
Args | |
---|---|
sample_state
|
The sample state after all records have been accumulated. In standard DP-SGD training, the accumulated sum of clipped microbatch gradients (in the special case of microbatches of size 1, the clipped per-example gradients). |
global_state
|
The global state, storing long-term privacy bookkeeping. |
Returns | |
---|---|
A tuple (result, new_global_state, event) where:
|
initial_global_state
initial_global_state()
Returns the initial global state for the DPQuery.
The global state contains any state information that changes across repeated applications of the mechanism. The default implementation returns just an empty tuple for implementing classes that do not have any persistent state.
This object must be processable via tf.nest.map_structure.
Returns | |
---|---|
The global state. |
initial_sample_state
@abc.abstractmethod
initial_sample_state( template=None )
Returns an initial state to use for the next sample.
For typical DPQuery
classes that are aggregated by summation, this should
return a nested structure of zero tensors of the appropriate shapes, to
which processed records will be aggregated.
Args | |
---|---|
template
|
A nested structure of tensors, TensorSpecs, or numpy arrays used
as a template to create the initial sample state. It is assumed that the
leaves of the structure are python scalars or some type that has
properties shape and dtype .
|
Returns: An initial sample state.
merge_sample_states
@abc.abstractmethod
merge_sample_states( sample_state_1, sample_state_2 )
Merges two sample states into a single state.
This can be useful if aggregation is performed hierarchically, where multiple sample states are used to accumulate records and then hierarchically merged into the final accumulated state. Typically this will be a simple sum.
Args | |
---|---|
sample_state_1
|
The first sample state to merge. |
sample_state_2
|
The second sample state to merge. |
Returns | |
---|---|
The merged sample state. |
preprocess_record
preprocess_record(
params, record
)
Preprocesses a single record.
This preprocessing is applied to one client's record, e.g. selecting vectors and clipping them to a fixed L2 norm. This method can be executed in a separate TF session, or even on a different machine, so it should not depend on any TF inputs other than those provided as input arguments. In particular, implementations should avoid accessing any TF tensors or variables that are stored in self.
Args | |
---|---|
params
|
The parameters for the sample. In standard DP-SGD training, the clipping norm for the sample's microbatch gradients (i.e., a maximum norm magnitude to which each gradient is clipped) |
record
|
The record to be processed. In standard DP-SGD training, the gradient computed for the examples in one microbatch, which may be the gradient for just one example (for size 1 microbatches). |
Returns | |
---|---|
A structure of tensors to be aggregated. |