tf.tpu.experimental.embedding.SGD
Stay organized with collections
Save and categorize content based on your preferences.
Optimization parameters for stochastic gradient descent for TPU embeddings.
tf.tpu.experimental.embedding.SGD(
learning_rate: Union[float, Callable[[], float]] = 0.01,
use_gradient_accumulation: bool = True,
clip_weight_min: Optional[float] = None,
clip_weight_max: Optional[float] = None,
weight_decay_factor: Optional[float] = None,
multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None,
clipvalue: Optional[ClipValueType] = None,
low_dimensional_packing_status: bool = False
)
Used in the notebooks
Pass this to tf.tpu.experimental.embedding.TPUEmbedding
via the optimizer
argument to set the global optimizer and its parameters:
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
...
optimizer=tf.tpu.experimental.embedding.SGD(0.1))
This can also be used in a tf.tpu.experimental.embedding.TableConfig
as the
optimizer parameter to set a table specific optimizer. This will override the
optimizer and parameters for global embedding optimizer defined above:
table_one = tf.tpu.experimental.embedding.TableConfig(
vocabulary_size=...,
dim=...,
optimizer=tf.tpu.experimental.embedding.SGD(0.2))
table_two = tf.tpu.experimental.embedding.TableConfig(
vocabulary_size=...,
dim=...)
feature_config = (
tf.tpu.experimental.embedding.FeatureConfig(
table=table_one),
tf.tpu.experimental.embedding.FeatureConfig(
table=table_two))
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
feature_config=feature_config,
batch_size=...
optimizer=tf.tpu.experimental.embedding.SGD(0.1))
In the above example, the first feature will be looked up in a table that has
a learning rate of 0.2 while the second feature will be looked up in a table
that has a learning rate of 0.1.
See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a
complete description of these parameters and their impacts on the optimizer
algorithm.
Args |
learning_rate
|
The learning rate. It should be a floating point value or a
callable taking no arguments for a dynamic learning rate.
|
use_gradient_accumulation
|
setting this to False makes embedding
gradients calculation less accurate but faster.
|
clip_weight_min
|
the minimum value to clip by; None means -infinity.
|
clip_weight_max
|
the maximum value to clip by; None means +infinity.
|
weight_decay_factor
|
amount of weight decay to apply; None means that the
weights are not decayed. Weights are decayed by multiplying the weight
by this factor each step.
|
multiply_weight_decay_factor_by_learning_rate
|
if true,
weight_decay_factor is multiplied by the current learning rate.
|
clipvalue
|
Controls clipping of the gradient. Set to either a single
positive scalar value to get clipping or a tiple of scalar values (min,
max) to set a separate maximum or minimum. If one of the two entries is
None, then there will be no clipping that direction. Note if this is
set, you may see a decrease in performance as gradient accumulation
will be enabled (it is normally off for SGD as it has no affect on
accuracy). See
'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for more
information on gradient accumulation and its impact on tpu embeddings.
|
low_dimensional_packing_status
|
Status of the low-dimensional embedding
packing optimization controls whether to optimize the packing of
1-dimensional, 2-dimensional, and 4-dimensional embedding tables in
memory.
|
Methods
__eq__
View source
__eq__(
other: Any
) -> Union[Any, bool]
Return self==value.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf.tpu.experimental.embedding.SGD\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/tpu/tpu_embedding_v2_utils.py#L189-L302) |\n\nOptimization parameters for stochastic gradient descent for TPU embeddings.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.tpu.experimental.embedding.SGD`](https://www.tensorflow.org/api_docs/python/tf/tpu/experimental/embedding/SGD)\n\n\u003cbr /\u003e\n\n tf.tpu.experimental.embedding.SGD(\n learning_rate: Union[float, Callable[[], float]] = 0.01,\n use_gradient_accumulation: bool = True,\n clip_weight_min: Optional[float] = None,\n clip_weight_max: Optional[float] = None,\n weight_decay_factor: Optional[float] = None,\n multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None,\n clipvalue: Optional[ClipValueType] = None,\n low_dimensional_packing_status: bool = False\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|-----------------------------------------------------------------------------------------------------------------------|\n| - [TensorFlow 2 TPUEmbeddingLayer: Quick Start](https://www.tensorflow.org/recommenders/examples/tpu_embedding_layer) |\n\nPass this to [`tf.tpu.experimental.embedding.TPUEmbedding`](../../../../tf/tpu/experimental/embedding/TPUEmbedding) via the `optimizer`\nargument to set the global optimizer and its parameters: \n\n embedding = tf.tpu.experimental.embedding.TPUEmbedding(\n ...\n optimizer=tf.tpu.experimental.embedding.SGD(0.1))\n\nThis can also be used in a [`tf.tpu.experimental.embedding.TableConfig`](../../../../tf/tpu/experimental/embedding/TableConfig) as the\noptimizer parameter to set a table specific optimizer. This will override the\noptimizer and parameters for global embedding optimizer defined above: \n\n table_one = tf.tpu.experimental.embedding.TableConfig(\n vocabulary_size=...,\n dim=...,\n optimizer=tf.tpu.experimental.embedding.SGD(0.2))\n table_two = tf.tpu.experimental.embedding.TableConfig(\n vocabulary_size=...,\n dim=...)\n\n feature_config = (\n tf.tpu.experimental.embedding.FeatureConfig(\n table=table_one),\n tf.tpu.experimental.embedding.FeatureConfig(\n table=table_two))\n\n embedding = tf.tpu.experimental.embedding.TPUEmbedding(\n feature_config=feature_config,\n batch_size=...\n optimizer=tf.tpu.experimental.embedding.SGD(0.1))\n\nIn the above example, the first feature will be looked up in a table that has\na learning rate of 0.2 while the second feature will be looked up in a table\nthat has a learning rate of 0.1.\n\nSee 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a\ncomplete description of these parameters and their impacts on the optimizer\nalgorithm.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `learning_rate` | The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate. |\n| `use_gradient_accumulation` | setting this to `False` makes embedding gradients calculation less accurate but faster. |\n| `clip_weight_min` | the minimum value to clip by; None means -infinity. |\n| `clip_weight_max` | the maximum value to clip by; None means +infinity. |\n| `weight_decay_factor` | amount of weight decay to apply; None means that the weights are not decayed. Weights are decayed by multiplying the weight by this factor each step. |\n| `multiply_weight_decay_factor_by_learning_rate` | if true, `weight_decay_factor` is multiplied by the current learning rate. |\n| `clipvalue` | Controls clipping of the gradient. Set to either a single positive scalar value to get clipping or a tiple of scalar values (min, max) to set a separate maximum or minimum. If one of the two entries is None, then there will be no clipping that direction. Note if this is set, you may see a decrease in performance as gradient accumulation will be enabled (it is normally off for SGD as it has no affect on accuracy). See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for more information on gradient accumulation and its impact on tpu embeddings. |\n| `low_dimensional_packing_status` | Status of the low-dimensional embedding packing optimization controls whether to optimize the packing of 1-dimensional, 2-dimensional, and 4-dimensional embedding tables in memory. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `__eq__`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/tpu/tpu_embedding_v2_utils.py#L176-L183) \n\n __eq__(\n other: Any\n ) -\u003e Union[Any, bool]\n\nReturn self==value."]]