|  View source on GitHub | 
Optimization parameters for Adam with TPU embeddings.
tf.tpu.experimental.embedding.Adam(
    learning_rate: Union[float, Callable[[], float]] = 0.001,
    beta_1: float = 0.9,
    beta_2: float = 0.999,
    epsilon: float = 1e-07,
    lazy_adam: bool = True,
    sum_inside_sqrt: bool = True,
    use_gradient_accumulation: bool = True,
    clip_weight_min: Optional[float] = None,
    clip_weight_max: Optional[float] = None,
    weight_decay_factor: Optional[float] = None,
    multiply_weight_decay_factor_by_learning_rate: bool = None,
    slot_variable_creation_fn: Optional[SlotVarCreationFnType] = None,
    clipvalue: Optional[ClipValueType] = None
)
Pass this to tf.tpu.experimental.embedding.TPUEmbedding via the optimizer
argument to set the global optimizer and its parameters:
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    ...
    optimizer=tf.tpu.experimental.embedding.Adam(0.1))
This can also be used in a tf.tpu.experimental.embedding.TableConfig as the
optimizer parameter to set a table specific optimizer. This will override the
optimizer and parameters for global embedding optimizer defined above:
table_one = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...,
    optimizer=tf.tpu.experimental.embedding.Adam(0.2))
table_two = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...)
feature_config = (
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_one),
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_two))
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    feature_config=feature_config,
    batch_size=...
    optimizer=tf.tpu.experimental.embedding.Adam(0.1))
In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.
See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.
| Args | |
|---|---|
| learning_rate | The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate. | 
| beta_1 | A float value. The exponential decay rate for the 1st moment estimates. | 
| beta_2 | A float value. The exponential decay rate for the 2nd moment estimates. | 
| epsilon | A small constant for numerical stability. | 
| lazy_adam | Use lazy Adam instead of Adam. Lazy Adam trains faster. | 
| sum_inside_sqrt | When this is true, the Adam update formula is changed
from m / (sqrt(v) + epsilon)tom / sqrt(v + epsilon**2). This
option improves the performance of TPU training and is not expected to
harm model quality. | 
| use_gradient_accumulation | Setting this to Falsemakes embedding
gradients calculation less accurate but faster. | 
| clip_weight_min | the minimum value to clip by; None means -infinity. | 
| clip_weight_max | the maximum value to clip by; None means +infinity. | 
| weight_decay_factor | amount of weight decay to apply; None means that the weights are not decayed. | 
| multiply_weight_decay_factor_by_learning_rate | if true, weight_decay_factoris multiplied by the current learning rate. | 
| slot_variable_creation_fn | If you wish do directly control the creation of the slot variables, set this to a callable taking three parameters: a table variable, a list of slot names to create for it, and a list of initializers. This function should return a dict with the slot names as keys and the created variables as values with types matching the table variable. When set to None (the default), uses the built-in variable creation. | 
| clipvalue | Controls clipping of the gradient. Set to either a single positive scalar value to get clipping or a tiple of scalar values (min, max) to set a separate maximum or minimum. If one of the two entries is None, then there will be no clipping that direction. |