Creates a cross-entropy loss using tf.nn.softmax_cross_entropy_with_logits_v2.

Migrate to TF2

tf.compat.v1.losses.softmax_cross_entropy is mostly compatible with eager execution and tf.function. But, the loss_collection argument is ignored when executing eagerly and no loss will be written to the loss collections. You will need to either hold on to the return value manually or rely on tf.keras.Model loss tracking.

To switch to native TF2 style, instantiate the tf.keras.losses.CategoricalCrossentropy class with from_logits set as True and call the object instead.

Structural Mapping to Native TF2


loss = tf.compat.v1.losses.softmax_cross_entropy(


loss_fn = tf.keras.losses.CategoricalCrossentropy(
loss = loss_fn(

How to Map Arguments

TF1 Arg Name TF2 Arg Name Note
- from_logits Set from_logits as True : to have identical behavior
onehot_labels y_true In __call__() method
logits y_pred In __call__() method
weights sample_weight In __call__() method
label_smoothing label_smoothing In constructor
scope Not supported -
loss_collection Not supported Losses should be tracked : explicitly or with Keras APIs, for example, add_loss, instead of via collections
reduction reduction In constructor. Value of : tf.compat.v1.losses.Reduction.SUM_OVER_BATCH_SIZE, tf.compat.v1.losses.Reduction.SUM, tf.compat.v1.losses.Reduction.NONE in tf.compat.v1.losses.softmax_cross_entropy correspond to tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE, tf.keras.losses.Reduction.SUM, tf.keras.losses.Reduction.NONE, respectively. If you used other value for reduction, including the default value tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS, there is no directly corresponding value. Please modify the loss implementation manually.

Before & After Usage Example


y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
weights = [0.3, 0.7]
smoothing = 0.2
tf.compat.v1.losses.softmax_cross_entropy(y_true, y_pred, weights=weights,


cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True,
cce(y_true, y_pred, sample_weight=weights).numpy()


weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

If label_smoothing is nonzero, smooth the labels towards 1/num_classes: new_onehot_labels = onehot_labels * (1 - label_smoothing)

                    + label_smoothing / num_classes

Note that onehot_labels and logits must have the same shape, e.g. [batch_size, num_classes]. The shape of weights must be broadcastable to loss, whose shape is decided by the shape of logits. In case the shape of logits is [batch_size, num_classes], loss is a Tensor of shape [batch_size].

onehot_labels One-hot-encoded labels.
logits Logits outputs of the network.
weights Optional Tensor that is broadcastable to loss.
label_smoothing If greater than 0 then smooth the labels.
scope the scope for the operations performed in computing the loss.
loss_collection collection to which the loss will be added.
reduction Type of reduction to apply to loss.

Weighted loss Tensor of the same type as logits. If reduction is NONE, this has shape [batch_size]; otherwise, it is scalar.

ValueError If the shape of logits doesn't match that of onehot_labels or if the shape of weights is invalid or if weights is None. Also if onehot_labels or logits is None.