Have a question? Connect with the community at the TensorFlow Forum Visit Forum

AdaDelta

public class AdaDelta<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative
    & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float

An AdaDelta optimizer.

Implements the AdaDelta optimization algorithm. AdaDelta is a stochastic gradient descent method based on the first order information. It adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. Thus, AdaDelta continues learning even when many updates have been done. It adapts faster to changing dynamics of the optimization problem space.

Reference: “ADADELTA: An Adaptive Learning Rate Method” (Zeiler, 2012)

  • Declaration

    public typealias Model = Model
  • The learning rate.

    Declaration

    public var learningRate: Float
  • rho

    The decay factor, corresponding to the fraction of gradient to keep at each time step.

    Declaration

    public var rho: Float
  • A small scalar added to the denominator to improve numerical stability.

    Declaration

    public var epsilon: Float
  • The learning rate decay.

    Declaration

    public var decay: Float
  • The current step.

    Declaration

    public var step: Int
  • The accumulated, exponentially decaying average of squared gradients.

    Declaration

    public var averageSquared: Model.TangentVector
  • The accumulated parameter updates.

    Declaration

    public var accumulatedDelta: Model.TangentVector
  • Creates an instance for model.

    Declaration

    public init(
      for model: __shared Model,
      learningRate: Float = 1,
      rho: Float = 0.95,
      epsilon: Float = 1e-6,
      decay: Float = 0
    )

    Parameters

    learningRate

    The learning rate. The default value is 1.

    rho

    The decay factor. The default value is 0.95.

    epsilon

    A small scalar added to the denominator to improve numerical stability. The default value is 1e-6.

    decay

    The learning rate decay. The defalut value is 0.

  • Declaration

    public func update(_ model: inout Model, along direction: Model.TangentVector)
  • Declaration

    public required init(copying other: AdaDelta, to device: Device)