View source on GitHub
  
 | 
Optimization parameters for Adagrad with TPU embeddings.
tf.tpu.experimental.embedding.Adagrad(
    learning_rate=0.001, initial_accumulator_value=0.1,
    use_gradient_accumulation=True, clip_weight_min=None, clip_weight_max=None,
    weight_decay_factor=None, multiply_weight_decay_factor_by_learning_rate=None,
    slot_variable_creation_fn=None
)
Pass this to tf.tpu.experimental.embedding.TPUEmbedding via the optimizer
argument to set the global optimizer and its parameters:
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    ...
    optimizer=tf.tpu.experimental.embedding.Adagrad(0.1))
This can also be used in a tf.tpu.experimental.embedding.TableConfig as the
optimizer parameter to set a table specific optimizer. This will override the
optimizer and parameters for global embedding optimizer defined above:
table_one = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...,
    optimizer=tf.tpu.experimental.embedding.Adagrad(0.2))
table_two = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...)
feature_config = (
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_one),
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_two))
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    feature_config=feature_config,
    batch_size=...
    optimizer=tf.tpu.experimental.embedding.Adagrad(0.1))
In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.
See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.
Args | |
|---|---|
learning_rate
 | 
The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate. | 
initial_accumulator_value
 | 
initial accumulator for Adagrad. | 
use_gradient_accumulation
 | 
setting this to False makes embedding
gradients calculation less accurate but faster.
 | 
clip_weight_min
 | 
the minimum value to clip by; None means -infinity. | 
clip_weight_max
 | 
the maximum value to clip by; None means +infinity. | 
weight_decay_factor
 | 
amount of weight decay to apply; None means that the weights are not decayed. | 
multiply_weight_decay_factor_by_learning_rate
 | 
if true,
weight_decay_factor is multiplied by the current learning rate.
 | 
slot_variable_creation_fn
 | 
Defaults to None. If you wish do directly
control the creation of the slot variables, set this to a callable
taking two parameters, a variable and a list of slot names to create for
it. This function should return a dict with the slot names as keys and
the created variables as values. When set to None (the default), uses
the built-in variable creation.
 | 
    View source on GitHub