tf.tpu.experimental.embedding.FTRL

Optimization parameters for FTRL with TPU embeddings.

View aliases

Compat aliases for migration

tf.compat.v1.tpu.experimental.embedding.FTRL

tf.tpu.experimental.embedding.FTRL(
    learning_rate: Union[float, Callable[[], float]] = 0.001,
    learning_rate_power: float = -0.5,
    l1_regularization_strength: float = 0.0,
    l2_regularization_strength: float = 0.0,
    beta: float = 0.0,
    initial_accumulator_value: float = 0.1,
    use_gradient_accumulation: bool = True,
    clip_weight_min: Optional[float] = None,
    clip_weight_max: Optional[float] = None,
    weight_decay_factor: Optional[float] = None,
    multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None,
    slot_variable_creation_fn: Optional[SlotVarCreationFnType] = None,
    clipvalue: Optional[ClipValueType] = None,
    multiply_linear_by_learning_rate: bool = False,
    allow_zero_accumulator: bool = False,
    low_dimensional_packing_status: bool = False
)

See Algorithm 1 of this paper.

Pass this to tf.tpu.experimental.embedding.TPUEmbedding via the optimizer argument to set the global optimizer and its parameters:

embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    ...
    optimizer=tf.tpu.experimental.embedding.FTRL(0.1))

This can also be used in a tf.tpu.experimental.embedding.TableConfig as the optimizer parameter to set a table specific optimizer. This will override the optimizer and parameters for global embedding optimizer defined above:

table_one = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...,
    optimizer=tf.tpu.experimental.embedding.FTRL(0.2))
table_two = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...)

feature_config = (
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_one),
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_two))

embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    feature_config=feature_config,
    batch_size=...
    optimizer=tf.tpu.experimental.embedding.FTRL(0.1))

In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.

See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.

Args
`learning_rate`	The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate.
`learning_rate_power`	A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate.
`l1_regularization_strength`	A float value, must be greater than or equal to zero.
`l2_regularization_strength`	A float value, must be greater than or equal to zero.
`beta`	A float value, representing the beta value from the paper.
`initial_accumulator_value`	The starting value for accumulators. Only zero or positive values are allowed.
`use_gradient_accumulation`	setting this to `False` makes embedding gradients calculation less accurate but faster.
`clip_weight_min`	the minimum value to clip by; None means -infinity.
`clip_weight_max`	the maximum value to clip by; None means +infinity.
`weight_decay_factor`	amount of weight decay to apply; None means that the weights are not decayed.
`multiply_weight_decay_factor_by_learning_rate`	if true, `weight_decay_factor` is multiplied by the current learning rate.
`slot_variable_creation_fn`	If you wish do directly control the creation of the slot variables, set this to a callable taking three parameters: a table variable, a list of slot names to create for it, and a list of initializers. This function should return a dict with the slot names as keys and the created variables as values with types matching the table variable. When set to None (the default), uses the built-in variable creation.
`clipvalue`	Controls clipping of the gradient. Set to either a single positive scalar value to get clipping or a tuple of scalar values (min, max) to set a separate maximum or minimum. If one of the two entries is None, then there will be no clipping that direction.
`multiply_linear_by_learning_rate`	If set to True, a modified formula is used for FTRL that treats the "linear" accumulator as being pre-multiplied by the learning rate (i.e., the accumulator named "linear" actually stores "linear * learning_rate"). Other than checkpoint compatibility, this is mathematically equivalent for a static learning rate; for a dynamic learning rate, it is nearly the same as long as the learning rate does not change quickly. The benefit of this is that the modified formula handles zero and near-zero learning rates without producing NaNs, improving flexibility for learning rate ramp-up.
`allow_zero_accumulator`	If set to True, changes some internal formulas to allow zero and near-zero accumulator values at the cost of some performance; this only needs to be set if you are using an initial accumulator value of zero, which is uncommon.
`low_dimensional_packing_status`	Status of the low-dimensional embedding packing optimization controls whether to optimize the packing of 1-dimensional, 2-dimensional, and 4-dimensional embedding tables in memory.

Methods

`eq`

View source

__eq__(
    other: Any
) -> Union[Any, bool]

Return self==value.

tf.tpu.experimental.embedding.FTRL

View aliases

Args

Methods

__eq__

`eq`