tf.keras.optimizers.Ftrl

TensorFlow 1 version

View source on GitHub

Optimizer that implements the FTRL algorithm.

Inherits From: Optimizer

View aliases

Main aliases

tf.optimizers.Ftrl

Compat aliases for migration

See Migration guide for more details.

tf.compat.v1.keras.optimizers.Ftrl

tf.keras.optimizers.Ftrl(
    learning_rate=0.001, learning_rate_power=-0.5, initial_accumulator_value=0.1,
    l1_regularization_strength=0.0, l2_regularization_strength=0.0, name='Ftrl',
    l2_shrinkage_regularization_strength=0.0, **kwargs
)

See Algorithm 1 of this paper. This version has support for both online L2 (the L2 penalty given in the paper above) and shrinkage-type L2 (which is the addition of an L2 penalty to the loss function).

Initialization:

t = 0

$t = 0$

n_{0} = 0

$n_{0} = 0$

σ_{0} = 0

$\sigma_{0} = 0$

z_{0} = 0

$z_{0} = 0$

Update (

i

$i$

is variable index):

t = t + 1

$t = t + 1$

n_{t, i} = n_{t - 1, i} + g_{t, i}^{2}

$n_{t,i} = n_{t-1,i} + g_{t,i}^{2}$

σ_{t, i} = (\sqrt{n_{t, i}} - \sqrt{n_{t - 1, i}}) / α

$\sigma_{t,i} = (\sqrt{n_{t,i} } - \sqrt{n_{t-1,i} }) / \alpha$

z_{t, i} = z_{t - 1, i} + g_{t, i} - σ_{t, i} * w_{t, i}

$z_{t,i} = z_{t-1,i} + g_{t,i} - \sigma_{t,i} * w_{t,i}$

w_{t, i} = - ((β + \sqrt{n + t}) / α + λ_{2})^{- 1} * (z_{i} - s g n (z_{i}) * λ_{1}) i f \abs z_{i} > λ_{i} e l s e 0

$w_{t,i} = - ((\beta+\sqrt{n+{t} }) / \alpha + \lambda_{2})^{-1} * (z_{i} - sgn(z_{i}) * \lambda_{1}) if \abs{z_{i} } > \lambda_{i} else 0$

Check the documentation for the l2_shrinkage_regularization_strength parameter for more details when shrinkage is enabled, where gradient is replaced with gradient_with_shrinkage.

Args
`learning_rate`	A float value or a constant float `Tensor`.
`learning_rate_power`	A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate.
`initial_accumulator_value`	The starting value for accumulators. Only zero or positive values are allowed.
`l1_regularization_strength`	A float value, must be greater than or equal to zero.
`l2_regularization_strength`	A float value, must be greater than or equal to zero.
`name`	Optional name prefix for the operations created when applying gradients. Defaults to "Ftrl".
`l2_shrinkage_regularization_strength`	A float value, must be greater than or equal to zero. This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. The FTRL formulation can be written as: w_{t+1} = argminw(\hat{g}{1:t}w + L1\|\|w\|\|_1 + L2\|\|w\|\|_2^2), where \hat{g} = g + (2L2_shrinkagew), and g is the gradient of the loss function w.r.t. the weights w. Specifically, in the absence of L1 regularization, it is equivalent to the following update rule: w_{t+1} = w_t - lr_t / (1 + 2L2lr_t) * g_t - 2L2_shrinkagelr_t / (1 + 2L2lr_t) * w_t where lr_t is the learning rate at t. When input is sparse shrinkage will only happen on the active weights.
`**kwargs`	keyword arguments. Allowed to be {`clipnorm`, `clipvalue`, `lr`, `decay`}. `clipnorm` is clip gradients by norm; `clipvalue` is clip gradients by value, `decay` is included for backward compatibility to allow time inverse decay of learning rate. `lr` is included for backward compatibility, recommended to use `learning_rate` instead.

Raises
`ValueError`	If one of the arguments is invalid.

Attributes
`iterations`	Variable. The number of training steps this Optimizer has run.
`weights`	Returns variables of this Optimizer based on the order created.

Methods

`add_slot`

View source

add_slot(
    var, slot_name, initializer='zeros'
)

Add a new slot variable for var.

`add_weight`

View source

add_weight(
    name, shape, dtype=None, initializer='zeros', trainable=None,
    synchronization=tf.VariableSynchronization.AUTO,
    aggregation=tf.compat.v1.VariableAggregation.NONE
)

`apply_gradients`

View source

apply_gradients(
    grads_and_vars, name=None
)

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

Args
`grads_and_vars`	List of (gradient, variable) pairs.
`name`	Optional name for the returned operation. Default to the name passed to the `Optimizer` constructor.

Returns
An `Operation` that applies the specified gradients. The `iterations` will be automatically increased by 1.

Raises
`TypeError`	If `grads_and_vars` is malformed.
`ValueError`	If none of the variables have gradients.

`from_config`

View source

@classmethod
from_config(
    config, custom_objects=None
)

Creates an optimizer from its config.

This method is the reverse of get_config, capable of instantiating the same optimizer from the config dictionary.

Arguments
`config`	A Python dictionary, typically the output of get_config.
`custom_objects`	A Python dictionary mapping names to additional Python objects used to create this optimizer, such as a function used for a hyperparameter.

Returns
An optimizer instance.

`get_config`

View source

get_config()

Returns the config of the optimimizer.

An optimizer config is a Python dictionary (serializable) containing the configuration of an optimizer. The same optimizer can be reinstantiated later (without any saved state) from this configuration.

Returns
Python dictionary.

`get_gradients`

View source

get_gradients(
    loss, params
)

Returns gradients of loss with respect to params.

Arguments
`loss`	Loss tensor.
`params`	List of variables.

Returns
List of gradient tensors.

Raises
`ValueError`	In case any gradient cannot be computed (e.g. if gradient function not implemented).

`get_slot`

View source

get_slot(
    var, slot_name
)

`get_slot_names`

View source

get_slot_names()

A list of names for this optimizer's slots.

`get_updates`

View source

get_updates(
    loss, params
)

`get_weights`

View source

get_weights()

`minimize`

View source

minimize(
    loss, var_list, grad_loss=None, name=None
)

Minimize loss by updating var_list.

This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function.

Args
`loss`	A callable taking no arguments which returns the value to minimize.
`var_list`	list or tuple of `Variable` objects to update to minimize `loss`, or a callable returning the list or tuple of `Variable` objects. Use callable when the variable list would otherwise be incomplete before `minimize` since the variables are created at the first time `loss` is called.
`grad_loss`	Optional. A `Tensor` holding the gradient computed for `loss`.
`name`	Optional name for the returned operation.

Returns
An `Operation` that updates the variables in `var_list`. The `iterations` will be automatically increased by 1.

Raises
`ValueError`	If some of the variables are not `Variable` objects.

`set_weights`

View source

set_weights(
    weights
)

`variables`

View source

variables()

Returns variables of this Optimizer based on the order created.

tf.keras.optimizers.Ftrl

View aliases

Initialization:

Args

Raises

Attributes

Methods

add_slot

add_weight

apply_gradients

from_config

get_config

get_gradients

get_slot

get_slot_names

get_updates

get_weights

minimize

set_weights

variables

`add_slot`

`add_weight`

`apply_gradients`

`from_config`

`get_config`

`get_gradients`

`get_slot`

`get_slot_names`

`get_updates`

`get_weights`

`minimize`

`set_weights`

`variables`