Returns a tff.learning.optimizers.Optimizer for momentum SGD.

Used in the notebooks

Used in the tutorials

This class supports the simple gradient descent and its variant with momentum.

If momentum is not used, the update rule given learning rate lr, weights w and gradients g is:

w = w - lr * g

If momentum m (a float between 0.0 and 1.0) is used, the update rule is

v = m * v + g
w = w - lr * v

where v is the velocity from previous steps of the optimizer.

learning_rate A positive float for learning rate, default to 0.01.
momentum An optional float between 0.0 and 1.0. If None, no momentum is used.