Adagrad Dual Averaging algorithm for sparse linear models.

Inherits From: Optimizer

This optimizer takes care of regularization of unseen features in a mini batch by updating them when they are seen with a closed form update rule that is equivalent to having updated them on every mini-batch.

AdagradDA is typically used when there is a need for large sparsity in the trained model. This optimizer only guarantees sparsity for linear models. Be careful when using AdagradDA for deep networks as it will require careful initialization of the gradient accumulators for it to train.


Adaptive Subgradient Methods for Online Learning and Stochastic Optimization :Duchi et al., 2011 (pdf)

learning_rate A Tensor or a floating point value. The learning rate.
global_step A Tensor containing the current training step number.
initial_gradient_squared_accumulator_value A floating point value. Starting value for the accumulators, must be positive.
l1_regularization_strength A float value, must be greater than or equal to zero.
l2_regularization_strength A float value, must be greater than or equal to zero.
use_locking If True use locks for update operations.
name Optional name prefix for the operations created when applying gradients. Defaults to "AdagradDA".

ValueError If the initial_gradient_squared_accumulator_value is invalid.