|View source on GitHub|
An optimizer that applies loss scaling to prevent numeric underflow.
Compat aliases for migration
See Migration guide for more details.
tf.keras.mixed_precision.LossScaleOptimizer( inner_optimizer, dynamic=True, initial_scale=None, dynamic_growth_steps=None )
Used in the notebooks
|Used in the guide|
Loss scaling is a technique to prevent numeric underflow in intermediate gradients when float16 is used. To prevent underflow, the loss is multiplied (or "scaled") by a certain factor called the "loss scale", which causes intermediate gradients to be scaled by the loss scale as well. The final gradients are divided (or "unscaled") by the loss scale to bring them back to their original value.
LossScaleOptimizer wraps another optimizer and applies loss scaling to it.
By default, the loss scale is dynamically updated over time so you do not have
to choose the loss scale. The
minimize method automatically scales the loss,
unscales the gradients, and updates the loss scale so all you have to do is
wrap your optimizer with a
LossScaleOptimizer if you use
opt = tf.keras.optimizers.SGD(0.25)
opt = tf.keras.mixed_precision.LossScaleOptimizer(opt)
var = tf.Variable(1.)
loss_fn = lambda: var ** 2
# 'minimize' applies loss scaling and updates the loss sale.
tf.GradientTape is used to compute gradients instead of
must scale the loss and gradients manually. This can be done with the
LossScaleOptimizer.get_unscaled_gradients methods. For example:
with tf.GradientTape() as tape:
loss = loss_fn()
scaled_loss = opt.get_scaled_loss(loss)
scaled_grad = tape.gradient(scaled_loss, var)
(grad,) = opt.get_unscaled_gradients([scaled_grad])
opt.apply_gradients([(grad, var)]) # Loss scale is updated here