tf.keras.optimizers.SGD
Stay organized with collections
Save and categorize content based on your preferences.
Gradient descent (with momentum) optimizer.
Inherits From: Optimizer
tf.keras.optimizers.SGD(
learning_rate=0.01,
momentum=0.0,
nesterov=False,
name='SGD',
**kwargs
)
Update rule for parameter w
with gradient g
when momentum
is 0:
w = w - learning_rate * g
Update rule when momentum
is larger than 0:
velocity = momentum * velocity - learning_rate * g
w = w + velocity
When nesterov=True
, this rule becomes:
velocity = momentum * velocity - learning_rate * g
w = w + momentum * velocity - learning_rate * g
Args |
learning_rate
|
A Tensor , floating point value, or a schedule that is a
tf.keras.optimizers.schedules.LearningRateSchedule , or a callable
that takes no arguments and returns the actual value to use. The
learning rate. Defaults to 0.01.
|
momentum
|
float hyperparameter >= 0 that accelerates gradient descent
in the relevant
direction and dampens oscillations. Defaults to 0, i.e., vanilla gradient
descent.
|
nesterov
|
boolean. Whether to apply Nesterov momentum.
Defaults to False .
|
name
|
Optional name prefix for the operations created when applying
gradients. Defaults to "SGD" .
|
**kwargs
|
Keyword arguments. Allowed to be one of
"clipnorm" or "clipvalue" .
"clipnorm" (float) clips gradients by norm; "clipvalue" (float) clips
gradients by value.
|
Usage:
opt = tf.keras.optimizers.SGD(learning_rate=0.1)
var = tf.Variable(1.0)
loss = lambda: (var ** 2)/2.0 # d(loss)/d(var1) = var1
step_count = opt.minimize(loss, [var]).numpy()
# Step is `- learning_rate * grad`
var.numpy()
0.9
opt = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
var = tf.Variable(1.0)
val0 = var.value()
loss = lambda: (var ** 2)/2.0 # d(loss)/d(var1) = var1
# First step is `- learning_rate * grad`
step_count = opt.minimize(loss, [var]).numpy()
val1 = var.value()
(val0 - val1).numpy()
0.1
# On later steps, step-size increases because of momentum
step_count = opt.minimize(loss, [var]).numpy()
val2 = var.value()
(val1 - val2).numpy()
0.18
Raises |
ValueError
|
in case of any invalid argument.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2022-10-27 UTC.
[null,null,["Last updated 2022-10-27 UTC."],[],[],null,["# tf.keras.optimizers.SGD\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/keras-team/keras/tree/v2.8.0/keras/optimizer_v2/gradient_descent.py#L22-L188) |\n\nGradient descent (with momentum) optimizer.\n\nInherits From: [`Optimizer`](../../../tf/keras/optimizers/Optimizer)\n\n#### View aliases\n\n\n**Main aliases**\n\n[`tf.optimizers.SGD`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/experimental/SGD)\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.keras.optimizers.SGD`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/legacy/SGD)\n\n\u003cbr /\u003e\n\n tf.keras.optimizers.SGD(\n learning_rate=0.01,\n momentum=0.0,\n nesterov=False,\n name='SGD',\n **kwargs\n )\n\nUpdate rule for parameter `w` with gradient `g` when `momentum` is 0: \n\n w = w - learning_rate * g\n\nUpdate rule when `momentum` is larger than 0: \n\n velocity = momentum * velocity - learning_rate * g\n w = w + velocity\n\nWhen `nesterov=True`, this rule becomes: \n\n velocity = momentum * velocity - learning_rate * g\n w = w + momentum * velocity - learning_rate * g\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `learning_rate` | A `Tensor`, floating point value, or a schedule that is a [`tf.keras.optimizers.schedules.LearningRateSchedule`](../../../tf/keras/optimizers/schedules/LearningRateSchedule), or a callable that takes no arguments and returns the actual value to use. The learning rate. Defaults to 0.01. |\n| `momentum` | float hyperparameter \\\u003e= 0 that accelerates gradient descent in the relevant direction and dampens oscillations. Defaults to 0, i.e., vanilla gradient descent. |\n| `nesterov` | boolean. Whether to apply Nesterov momentum. Defaults to `False`. |\n| `name` | Optional name prefix for the operations created when applying gradients. Defaults to `\"SGD\"`. |\n| `**kwargs` | Keyword arguments. Allowed to be one of `\"clipnorm\"` or `\"clipvalue\"`. `\"clipnorm\"` (float) clips gradients by norm; `\"clipvalue\"` (float) clips gradients by value. |\n\n\u003cbr /\u003e\n\n#### Usage:\n\n opt = tf.keras.optimizers.SGD(learning_rate=0.1)\n var = tf.Variable(1.0)\n loss = lambda: (var ** 2)/2.0 # d(loss)/d(var1) = var1\n step_count = opt.minimize(loss, [var]).numpy()\n # Step is `- learning_rate * grad`\n var.numpy()\n 0.9\n\n opt = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)\n var = tf.Variable(1.0)\n val0 = var.value()\n loss = lambda: (var ** 2)/2.0 # d(loss)/d(var1) = var1\n # First step is `- learning_rate * grad`\n step_count = opt.minimize(loss, [var]).numpy()\n val1 = var.value()\n (val0 - val1).numpy()\n 0.1\n # On later steps, step-size increases because of momentum\n step_count = opt.minimize(loss, [var]).numpy()\n val2 = var.value()\n (val1 - val2).numpy()\n 0.18\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Reference --------- ||\n|---|---|\n| \u003cbr /\u003e - For `nesterov=True`, See [Sutskever et al., 2013](http://jmlr.org/proceedings/papers/v28/sutskever13.pdf). ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|----------------------------------|\n| `ValueError` | in case of any invalid argument. |\n\n\u003cbr /\u003e"]]