|  View source on GitHub | 
A running covariance computation.
Inherits From: AutoCompositeTensor
tfp.experimental.stats.RunningCovariance(
    num_samples,
    mean,
    sum_squared_residuals,
    event_ndims,
    name='RunningCovariance'
)
Used in the notebooks
| Used in the tutorials | 
|---|
The running covariance computation supports batching. The event_ndims
parameter indicates the number of trailing dimensions to treat as part of
the event, and to compute covariance across. The leading dimensions, if
any, are treated as batch shape, and no cross terms are computed.
For example, if the incoming samples have shape [5, 7], the event_ndims
selects among three different covariance computations:
- event_ndims=0treats the samples as a- [5, 7]batch of scalar random variables, and computes their variances in batch. The shape of the result is- [5, 7].
- event_ndims=1treats the samples as a- [5]batch of vector random variables of shape- [7], and computes their covariances in batch. The shape of the result is- [5, 7, 7].
- event_ndims=2treats the samples as a single random variable of shape- [5, 7]and computes its covariance. The shape of the result is- [5, 7, 5, 7].
RunningCovariance is meant to serve general streaming covariance needs.
For a specialized version that fits streaming over MCMC samples, see
CovarianceReducer in tfp.experimental.mcmc.
Methods
covariance
covariance(
    ddof=0
)
Returns the covariance accumulated so far.
| Args | |
|---|---|
| ddof | Requested dynamic degrees of freedom for the covariance calculation.
For example, use ddof=0for population covariance andddof=1for
sample covariance. Defaults to the population covariance. | 
| Returns | |
|---|---|
| covariance | An estimate of the covariance. | 
from_example
@classmethodfrom_example( example, event_ndims=None, name='RunningCovariance' )
Starts a RunningCovariance from an example.
| Args | |
|---|---|
| example | A Tensor.  TheRunningCovariancewill accept samples
of the same dtype and broadcast-compatible shape as the example. | 
| event_ndims | Number of dimensions that specify the event shape, from
the inner-most dimensions.  Specifying Nonereturns all cross
product terms (no batching) and is the default. | 
| name | Python strname prefixed to Ops created by this class. | 
| Returns | |
|---|---|
| cov | An empty RunningCovariance, ready for incoming samples.  Note
that by convention, the supplied example is used only for
initialization, but not counted as a sample. | 
| Raises | |
|---|---|
| ValueError | if event_ndimsis greater than the rank of the example. | 
from_shape
@classmethodfrom_shape( shape=(), dtype=tf.float32, event_ndims=None, name='RunningCovariance' )
Starts a RunningCovariance from shape and dtype metadata.
| Args | |
|---|---|
| shape | Python TupleorTensorShaperepresenting the shape of incoming
samples.  This is useful to supply if theRunningCovariancewill be
carried by atf.while_loop, so that broadcasting does not change the
shape across loop iterations. | 
| dtype | Dtype of incoming samples and the resulting statistics.
By default, the dtype is tf.float32. Any integer dtypes will be
cast to corresponding floats (i.e.tf.int32will be cast totf.float32), as intermediate calculations should be performing
floating-point division. | 
| event_ndims | Number of dimensions that specify the event shape, from
the inner-most dimensions.  Specifying Nonereturns all cross
product terms (no batching) and is the default. | 
| name | Python strname prefixed to Ops created by this class. | 
| Returns | |
|---|---|
| cov | An empty RunningCovariance, ready for incoming samples. | 
| Raises | |
|---|---|
| ValueError | if event_ndimsis greater than the rank of the intended
incoming samples (operation is extraneous). | 
tree_flatten
tree_flatten()
tree_unflatten
@classmethodtree_unflatten( metadata, tensors )
update
update(
    new_sample, axis=None
)
Update the RunningCovariance with a new sample.
The update formula is from Philippe Pebay (2008) [1]. This implementation supports both batched and chunked covariance computation. A "batch" is the usual parallel computation, namely a batch of size N implies N independent covariance computations, each stepping one sample (or chunk) at a time. A "chunk" of size M implies incorporating M samples into a single covariance computation at once, which is more efficient than one by one.
To further illustrate the difference between batching and chunking, consider the following example:
# treat as 3 samples from each of 5 independent vector random variables of
# shape (2,)
sample = tf.ones((3, 5, 2))
running_cov = tfp.experimental.stats.RunningCovariance.from_shape(
    (5, 2), event_ndims=1)
running_cov = running_cov.update(sample, axis=0)
final_cov = running_cov.covariance()
final_cov.shape # (5, 2, 2)
| Args | |
|---|---|
| new_sample | Incoming sample with shape and dtype compatible with those
used to form this RunningCovariance. | 
| axis | If chunking is desired, this is an integer that specifies the axis
with chunked samples. For individual samples, set this to None. By
default, samples are not chunked (axisis None). | 
| Returns | |
|---|---|
| cov | Newly allocated RunningCovarianceupdated to includenew_sample. | 
References
[1]: Philippe Pebay. Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments. Technical Report SAND2008-6212, 2008. https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2008/086212.pdf