Dirichlet-Multinomial compound distribution.

Inherits From: Distribution

The Dirichlet-Multinomial distribution is parameterized by a (batch of) length-K concentration vectors (K > 1) and a total_count number of trials, i.e., the number of trials per draw from the DirichletMultinomial. It is defined over a (batch of) length-K vector counts such that tf.reduce_sum(counts, -1) = total_count. The Dirichlet-Multinomial is identically the Beta-Binomial distribution when K = 2.

Mathematical Details

The Dirichlet-Multinomial is a distribution over K-class counts, i.e., a length-K vector of non-negative integer counts = n = [n_0, ..., n_{K-1}].

The probability mass function (pmf) is,

pmf(n; alpha, N) = Beta(alpha + n) / (prod_j n_j!) / Z
Z = Beta(alpha) / N!


  • concentration = alpha = [alpha_0, ..., alpha_{K-1}], alpha_j > 0,
  • total_count = N, N a positive integer,
  • N! is N factorial, and,
  • Beta(x) = prod_j Gamma(x_j) / Gamma(sum_j x_j) is the multivariate beta function, and,
  • Gamma is the gamma function.

Dirichlet-Multinomial is a compound distribution, i.e., its samples are generated as follows.

  1. Choose class probabilities: probs = [p_0,...,p_{K-1}] ~ Dir(concentration)
  2. Draw integers: counts = [n_0,...,n_{K-1}] ~ Multinomial(total_count, probs)

The last concentration dimension parametrizes a single Dirichlet-Multinomial distribution. When calling distribution functions (e.g., dist.prob(counts)), concentration, total_count and counts are broadcast to the same shape. The last dimension of counts corresponds single Dirichlet-Multinomial distributions.

Distribution parameters are automatically broadcast in all functions; see examples for details.


The number of classes, K, must not exceed:

  • the largest integer representable by self.dtype, i.e., 2**(mantissa_bits+1) (IEE754),
  • the maximum Tensor index, i.e., 2**31-1.

In other words,

K <= min(2**31-1, {
  tf.float16: 2**11,
  tf.float32: 2**24,
  tf.float64: 2**53 }[param.dtype])


alpha = [1., 2., 3.]
n = 2.
dist = DirichletMultinomial(n, alpha)

Creates a 3-class distribution, with the 3rd class is most likely to be drawn. The distribution functions can be evaluated on counts.