Adds a Jensen-Shannon divergence to the training procedure.

For brevity, let P = labels, Q = predictions, KL(P||Q) be the Kullback-Leibler divergence as defined in the description of the nsl.lib.kl_divergence function.". The Jensen-Shannon divergence (JSD) is

M = (P + Q) / 2
JSD(P||Q) = KL(P||M) / 2 + KL(Q||M) / 2

This function assumes that predictions and labels are the values of a multinomial distribution, i.e., each value is the probability of the corresponding class.

For the usage of weights and reduction, please refer to tf.losses.

labels Tensor of type float32 or float64, with shape [d1, ..., dN, num_classes], represents the target distribution.
predictions Tensor of the same type and shape as labels, represents the predicted distribution.
axis The dimension along which the Jensen-Shannon divergence is computed. The values of labels and predictions along axis should meet the requirements of a multinomial distribution.
weights (optional) Tensor whose rank is either 0, or the same as that of labels, and must be broadcastable to labels (i.e., all dimensions must be either 1, or the same as the corresponding losses dimension).
scope The scope for the operations performed in computing the loss.
loss_collection Collection to which the loss will be added.
reduction Type of reduction to apply to the loss.

Weighted loss float Tensor. If reduction is tf.compat.v1.losses.Reduction.MEAN, this has the same shape as labels, otherwise, it is a scalar.

InvalidArgumentError If labels or predictions don't meet the requirements of a multinomial distribution.
ValueError If axis is None, the shape of predictions doesn't match that of labels, or if the shape of weights is invalid.