tfp.experimental.stats.RunningCovariance

A running covariance computation.

Inherits From: AutoCompositeTensor

tfp.experimental.stats.RunningCovariance(
    num_samples,
    mean,
    sum_squared_residuals,
    event_ndims,
    name='RunningCovariance'
)

Used in the notebooks

Used in the tutorials
TFP Release Notes notebook (0.12.1)

The running covariance computation supports batching. The event_ndims parameter indicates the number of trailing dimensions to treat as part of the event, and to compute covariance across. The leading dimensions, if any, are treated as batch shape, and no cross terms are computed.

For example, if the incoming samples have shape [5, 7], the event_ndims selects among three different covariance computations:

event_ndims=0 treats the samples as a [5, 7] batch of scalar random variables, and computes their variances in batch. The shape of the result is [5, 7].
event_ndims=1 treats the samples as a [5] batch of vector random variables of shape [7], and computes their covariances in batch. The shape of the result is [5, 7, 7].
event_ndims=2 treats the samples as a single random variable of shape [5, 7] and computes its covariance. The shape of the result is [5, 7, 5, 7].

RunningCovariance is meant to serve general streaming covariance needs. For a specialized version that fits streaming over MCMC samples, see CovarianceReducer in tfp.experimental.mcmc.

Methods

`covariance`

View source

covariance(
    ddof=0
)

Returns the covariance accumulated so far.

Args
`ddof`	Requested dynamic degrees of freedom for the covariance calculation. For example, use `ddof=0` for population covariance and `ddof=1` for sample covariance. Defaults to the population covariance.

Returns
`covariance`	An estimate of the covariance.

`from_example`

View source

@classmethod
from_example(
    example, event_ndims=None, name='RunningCovariance'
)

Starts a RunningCovariance from an example.

Args
`example`	A `Tensor`. The `RunningCovariance` will accept samples of the same dtype and broadcast-compatible shape as the example.
`event_ndims`	Number of dimensions that specify the event shape, from the inner-most dimensions. Specifying `None` returns all cross product terms (no batching) and is the default.
`name`	Python `str` name prefixed to Ops created by this class.

Returns
`cov`	An empty `RunningCovariance`, ready for incoming samples. Note that by convention, the supplied example is used only for initialization, but not counted as a sample.

Raises
`ValueError`	if `event_ndims` is greater than the rank of the example.

`from_shape`

View source

@classmethod
from_shape(
    shape=(),
    dtype=tf.float32,
    event_ndims=None,
    name='RunningCovariance'
)

Starts a RunningCovariance from shape and dtype metadata.

Args
`shape`	Python `Tuple` or `TensorShape` representing the shape of incoming samples. This is useful to supply if the `RunningCovariance` will be carried by a `tf.while_loop`, so that broadcasting does not change the shape across loop iterations.
`dtype`	Dtype of incoming samples and the resulting statistics. By default, the dtype is `tf.float32`. Any integer dtypes will be cast to corresponding floats (i.e. `tf.int32` will be cast to `tf.float32`), as intermediate calculations should be performing floating-point division.
`event_ndims`	Number of dimensions that specify the event shape, from the inner-most dimensions. Specifying `None` returns all cross product terms (no batching) and is the default.
`name`	Python `str` name prefixed to Ops created by this class.

Returns
`cov`	An empty `RunningCovariance`, ready for incoming samples.

Raises
`ValueError`	if `event_ndims` is greater than the rank of the intended incoming samples (operation is extraneous).

`tree_flatten`

View source

tree_flatten()

`tree_unflatten`

View source

@classmethod
tree_unflatten(
    metadata, tensors
)

`update`

View source

update(
    new_sample, axis=None
)

Update the RunningCovariance with a new sample.

The update formula is from Philippe Pebay (2008) [1]. This implementation supports both batched and chunked covariance computation. A "batch" is the usual parallel computation, namely a batch of size N implies N independent covariance computations, each stepping one sample (or chunk) at a time. A "chunk" of size M implies incorporating M samples into a single covariance computation at once, which is more efficient than one by one.

To further illustrate the difference between batching and chunking, consider the following example:

# treat as 3 samples from each of 5 independent vector random variables of
# shape (2,)
sample = tf.ones((3, 5, 2))
running_cov = tfp.experimental.stats.RunningCovariance.from_shape(
    (5, 2), event_ndims=1)
running_cov = running_cov.update(sample, axis=0)
final_cov = running_cov.covariance()
final_cov.shape # (5, 2, 2)

Args
`new_sample`	Incoming sample with shape and dtype compatible with those used to form this `RunningCovariance`.
`axis`	If chunking is desired, this is an integer that specifies the axis with chunked samples. For individual samples, set this to `None`. By default, samples are not chunked (`axis` is None).

Returns
`cov`	Newly allocated `RunningCovariance` updated to include `new_sample`.

References

[1]: Philippe Pebay. Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments. Technical Report SAND2008-6212, 2008. https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2008/086212.pdf