Have a question? Connect with the community at the TensorFlow Forum Visit Forum

tf.tpu.experimental.embedding.Adagrad

View source on GitHub

Optimization parameters for Adagrad with TPU embeddings.

Pass this to tf.tpu.experimental.embedding.TPUEmbedding via the optimizer argument to set the global optimizer and its parameters:

embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    ...
    optimizer=tf.tpu.experimental.embedding.Adagrad(0.1))

This can also be used in a tf.tpu.experimental.embedding.TableConfig as the optimizer parameter to set a table specific optimizer. This will override the optimizer and parameters for global embedding optimizer defined above:

table_one = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...,
    optimizer=tf.tpu.experimental.embedding.Adagrad(0.2))
table_two = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...)

feature_config = (
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_one),
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_two))

embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    feature_config=feature_config,
    batch_size=...
    optimizer=tf.tpu.experimental.embedding.Adagrad(0.1))

In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.

See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.

learning_rate The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate.
initial_accumulator_value initial accumulator for Adagrad.
use_gradient_accumulation setting this to False makes embedding gradients calculation less accurate but faster.
clip_weight_min the minimum value to clip by; None means -infinity.
clip_weight_max the maximum value to clip by; None means +infinity.
weight_decay_factor amount of weight decay to apply; None means that the weights are not decayed.
multiply_weight_decay_factor_by_learning_rate if true, weight_decay_factor is multiplied by the current learning rate.
slot_variable_creation_fn Defaults to None. If you wish do directly control the creation of the slot variables, set this to a callable taking two parameters, a variable and a list of slot names to create for it. This function should return a dict with the slot names as keys and the created variables as values. When set to None (the default), uses the built-in variable creation.