View source on GitHub |
Sample correlation (Pearson) between observations indexed by event_axis
.
tfp.substrates.jax.stats.correlation(
x, y=None, sample_axis=0, event_axis=-1, keepdims=False, name=None
)
Given N
samples of scalar random variables X
and Y
, correlation may be
estimated as
Corr[X, Y] := Cov[X, Y] / Sqrt(Cov[X, X] * Cov[Y, Y]),
where
Cov[X, Y] := N^{-1} sum_{n=1}^N (X_n - Xbar) Conj{(Y_n - Ybar)}
Xbar := N^{-1} sum_{n=1}^N X_n
Ybar := N^{-1} sum_{n=1}^N Y_n
Correlation is always in the interval [-1, 1]
, and Corr[X, X] == 1
.
For vector-variate random variables X = (X1, ..., Xd)
, Y = (Y1, ..., Yd)
,
one is often interested in the correlation matrix, C_{ij} := Corr[Xi, Yj]
.
x = tf.random.stateless_normal(shape=(100, 2, 3))
y = tf.random.stateless_normal(shape=(100, 2, 3))
# corr[i, j] is the sample correlation between x[:, i, j] and y[:, i, j].
corr = tfp.stats.correlation(x, y, sample_axis=0, event_axis=None)
# corr_matrix[i, m, n] is the sample correlation of x[:, i, m] and y[:, i, n]
corr_matrix = tfp.stats.correlation(x, y, sample_axis=0, event_axis=-1)
Notice we divide by N
(the numpy default), which does not create NaN
when N = 1
, but is slightly biased.
Returns | |
---|---|
corr
|
A Tensor of same dtype as the x , and rank equal to
rank(x) - len(sample_axis) + 2 * len(event_axis) .
|
Raises | |
---|---|
AssertionError
|
If x and y are found to have different shape.
|
ValueError
|
If sample_axis and event_axis are found to overlap.
|
ValueError
|
If event_axis is found to not be contiguous.
|