# tfp.stats.quantile_auc

Calculate ranking stats AUROC and AUPRC.

Computes AUROC and AUPRC from quantiles (one for positive trials and one for negative trials).

We use `pi(x)` to denote a score. We assume that if `pi(x) > k` for some threshold `k` then the event is predicted to be "1", and otherwise it is predicted to be a "0". Its actual label is `y`, which may or may not be the same.

Area Under Curve: Receiver Operator Characteristic (AUROC) is defined as:

/ 1 | TruePositiveRate(k) d FalsePositiveRate(k) / 0

where,

``````TruePositiveRate(k) = P(pi(x) > k | y = 1)
FalsePositiveRate(k) = P(pi(x) > k | y = 0)
``````

Area Under Curve: Precision-Recall (AUPRC) is defined as:

/ 1 | Precision(k) d Recall(k) / 0

where,

``````  Precision(k) = P(y = 1 | pi(x) > k)
Recall(k) = TruePositiveRate(k) = P(pi(x) > k | y = 1)
``````

Notice that AUROC and AUPRC exchange the role of Recall in the integration, i.e.,

``````        Integrand    Measure
+------------+-----------+
``````

AUROC | Recall | FPR | +------------+-----------+ AUPRC | Precision | Recall | +------------+-----------+

`q0` `N-D` `Tensor` of `float`, Quantiles of predicted probabilities given a negative trial. The first `N-1` dimensions are batch dimensions, and the AUC is calculated over the final dimension.
`n0` `float` or `(N-1)-D Tensor`, Number of negative trials. If `Tensor`, dimensions must match the first `N-1` dimensions of `q0`.
`q1` `N-D` `Tensor` of `float`, Quantiles of predicted probabilities given a positive trial. The first `N-1` dimensions are batch dimensions, which must match those of `q0`.
`n1` `float` or `(N-1)-D Tensor`, Number of positive trials. If `Tensor`, dimensions must match the first `N-1` dimensions of `q0`.
`curve` `str`, Specifies the name of the curve to be computed. Must be 'ROC' [default] or 'PR' for the Precision-Recall-curve.
`name` `str`, An optional name_scope name.

`auc` `Tensor` of `float`, area under the ROC or PR curve.

#### Examples

``````  n = 1000
m = 500
predictions_given_positive = np.random.rand(n)
predictions_given_negative = np.random.rand(m)
q1 = tfp.stats.quantiles(predictions_given_positive, num_quantiles=50)
q0 = tfp.stats.quantiles(predictions_given_negative, num_quantiles=50)
auroc = tfp.stats.quantile_auc(q0, m, q1, n, curve='ROC')

``````

### Mathematical Details

The algorithm proceeds by partitioning the combined quantile data into a series of intervals `[a, b)`, approximating the probability mass of predictions conditioned on a positive trial (`d1`) and probability mass of predictions conditioned on a negative trial (`d0`) in each interval `[a, b)`, and accumulating the incremental AUROC/AUPRC as functions of `d0` and `d1`.

We assume that pi(x) is uniform within a given bucket of each quantile. Thus it will also be uniform within an interval [a, b) as long as the interval does not cross the quantile's bucket boundaries.

A consequence of this assumption is that the cdf is piecewise linear. That is,

P( pi(x) > k | y = 0 ), and, P( pi(x) > k | y = 1 ),

are linear in `k`.

Standard AUROC is fairly easier to calculate. Under the conditional uniformity assumptions we have a piece's contribution, [a, b), as:

/ b | | P(y = 1 | pi > k) d P(pi > k | y = 0) | / a

``````/ b
|
``````

= - | P(pi > k | y = 1) P(pi = k | y = 0) d k | / a

``````                   / b
``````

-1 / (len(q0) - 1) | = -------------------- | P(pi > k | y = 1) d k q0[j + 1] - q0[j] | / a

``````                          / 1
``````

1 / (len(q0) - 1) | = ------------------- (b - a) | (p1 + u d1) d u q0[j + 1] - q0[j] | / 0

1 / (len(q0) - 1) = ------------------- (b - a) (p1 + d1 / 2) q0[j + 1] - q0[j]

AUPRC is a bit harder to calculate since the integrand, `P(y > 0 | pi(x) > k)`, is conditional on `k` rather than a probability over `k`.

We proceed by formulating Precision in terms of the quantiles we have available to us.

Precision(k) = P(y = 1 | pi(x) > k)

``````               P(pi(x) > delta | y = 1 ) P(y = 1)
``````

= ----------------------------------------------------------------------- P(pi(x) > delta | y = 1 ) P(y = 1) + P(pi(x) > delta | y = 0 ) P(y = 0)

Since the cdf's are piecewise linear, we calculate this piece's contribution to AUPRC by the integral:

/ b | | P(y = 1 | pi(x) > delta) d P(pi > delta | y = 1) | / a

1 / (len(q1) - 1) = ------------------- (b - a) * q1[i + 1] - q1[i]

``````  / 1
|             n1 * (u d1 + p1)

* |  -------------------------------------- du
|   n1 * (u d1 + p1) +  n0 * (u d0 + p0)
/ 0

/ 1
``````

1 / (len(q1) - 1) | n1 * (u d1 + p1) = ------------------- (b - a) | ------------------------------------ du q1[i + 1] - q1[i] | n1 * (u d1 + p1) + n0 * (u d0 + p0) / 0

where the equality is a consequence of the piecewise uniformity assumption.

The solution to the integral is given by Mathematica:

``````Integrate[n1 (u d1 + p1) / (n1 (u d1 + p1) + n0 (u d0 + p0)), {u, 0, 1},
Assumptions -> {p1 > 0, d1 > 0, p0 > 0, d0 > 0, n1 > 0, n0 > 0}]
``````

This integral can be solved by hand by noticing that:

f(x) / (f(x) + g(x)) = 1 / (1 + g(x)/f(x))

Thus define: u = 1 + g(x)/f(x) for which: du = [g'(x)h(x) - g(x)f'(x)] / h(x)^2 dx and solving integral 1/u du.

: Jesse Davis and Mark Goadrich. The relationship between Precision-Recall and ROC curves. In International Conference on Machine Learning, 2006. http://dl.acm.org/citation.cfm?id=1143874