This can also be used in a tf.tpu.experimental.embedding.TableConfig as the
optimizer parameter to set a table specific optimizer. This will override the
optimizer and parameters for global embedding optimizer defined above:
In the above example, the first feature will be looked up in a table that has
a learning rate of 0.2 while the second feature will be looked up in a table
that has a learning rate of 0.1.
See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a
complete description of these parameters and their impacts on the optimizer
algorithm.
Args
learning_rate
The learning rate. It should be a floating point value or a
callable taking no arguments for a dynamic learning rate.
beta_1
A float value.
The exponential decay rate for the 1st moment estimates.
beta_2
A float value.
The exponential decay rate for the 2nd moment estimates.
epsilon
A small constant for numerical stability.
lazy_adam
Use lazy Adam instead of Adam. Lazy Adam trains faster.
sum_inside_sqrt
When this is true, the Adam update formula is changed
from m / (sqrt(v) + epsilon) to m / sqrt(v + epsilon**2). This
option improves the performance of TPU training and is not expected to
harm model quality.
use_gradient_accumulation
Setting this to False makes embedding
gradients calculation less accurate but faster.
clip_weight_min
the minimum value to clip by; None means -infinity.
clip_weight_max
the maximum value to clip by; None means +infinity.
weight_decay_factor
amount of weight decay to apply; None means that the
weights are not decayed.
multiply_weight_decay_factor_by_learning_rate
if true,
weight_decay_factor is multiplied by the current learning rate.
slot_variable_creation_fn
a callable taking two parameters, a variable
and a list of slot names to create for it. This function should return
a dict with the slot names as keys and the created variables as values.
When set to None (the default), uses the built-in variable creation.
[null,null,["Last updated 2020-10-01 UTC."],[],[],null,["# tf.tpu.experimental.embedding.Adam\n\n\u003cbr /\u003e\n\n|-----------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/tpu/tpu_embedding_v2_utils.py#L314-L450) |\n\nOptimization parameters for Adam with TPU embeddings.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.tpu.experimental.embedding.Adam`](/api_docs/python/tf/tpu/experimental/embedding/Adam)\n\n\u003cbr /\u003e\n\n tf.tpu.experimental.embedding.Adam(\n learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, lazy_adam=True,\n sum_inside_sqrt=True, use_gradient_accumulation=True, clip_weight_min=None,\n clip_weight_max=None, weight_decay_factor=None,\n multiply_weight_decay_factor_by_learning_rate=None,\n slot_variable_creation_fn=None\n )\n\nPass this to [`tf.tpu.experimental.embedding.TPUEmbedding`](../../../../tf/tpu/experimental/embedding/TPUEmbedding) via the `optimizer`\nargument to set the global optimizer and its parameters:\n**Note:** By default this optimizer is lazy, i.e. it will not apply the gradient update of zero to rows that were not looked up. You can change this behavior by setting `lazy_adam` to `False`. \n\n embedding = tf.tpu.experimental.embedding.TPUEmbedding(\n ...\n optimizer=tf.tpu.experimental.embedding.Adam(0.1))\n\nThis can also be used in a [`tf.tpu.experimental.embedding.TableConfig`](../../../../tf/tpu/experimental/embedding/TableConfig) as the\noptimizer parameter to set a table specific optimizer. This will override the\noptimizer and parameters for global embedding optimizer defined above: \n\n table_one = tf.tpu.experimental.embedding.TableConfig(\n vocabulary_size=...,\n dim=...,\n optimizer=tf.tpu.experimental.embedding.Adam(0.2))\n table_two = tf.tpu.experimental.embedding.TableConfig(\n vocabulary_size=...,\n dim=...)\n\n feature_config = (\n tf.tpu.experimental.embedding.FeatureConfig(\n table=table_one),\n tf.tpu.experimental.embedding.FeatureConfig(\n table=table_two))\n\n embedding = tf.tpu.experimental.embedding.TPUEmbedding(\n feature_config=feature_config,\n batch_size=...\n optimizer=tf.tpu.experimental.embedding.Adam(0.1))\n\nIn the above example, the first feature will be looked up in a table that has\na learning rate of 0.2 while the second feature will be looked up in a table\nthat has a learning rate of 0.1.\n\nSee 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a\ncomplete description of these parameters and their impacts on the optimizer\nalgorithm.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `learning_rate` | The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate. |\n| `beta_1` | A float value. The exponential decay rate for the 1st moment estimates. |\n| `beta_2` | A float value. The exponential decay rate for the 2nd moment estimates. |\n| `epsilon` | A small constant for numerical stability. |\n| `lazy_adam` | Use lazy Adam instead of Adam. Lazy Adam trains faster. |\n| `sum_inside_sqrt` | When this is true, the Adam update formula is changed from `m / (sqrt(v) + epsilon)` to `m / sqrt(v + epsilon**2)`. This option improves the performance of TPU training and is not expected to harm model quality. |\n| `use_gradient_accumulation` | Setting this to `False` makes embedding gradients calculation less accurate but faster. |\n| `clip_weight_min` | the minimum value to clip by; None means -infinity. |\n| `clip_weight_max` | the maximum value to clip by; None means +infinity. |\n| `weight_decay_factor` | amount of weight decay to apply; None means that the weights are not decayed. |\n| `multiply_weight_decay_factor_by_learning_rate` | if true, `weight_decay_factor` is multiplied by the current learning rate. |\n| `slot_variable_creation_fn` | a callable taking two parameters, a variable and a list of slot names to create for it. This function should return a dict with the slot names as keys and the created variables as values. When set to None (the default), uses the built-in variable creation. |\n\n\u003cbr /\u003e"]]