Classes | Swift for TensorFlow


                    
                    
                    Parameter

A mutable, shareable, owning reference to a tensor.

Declaration

public final class Parameter<Scalar> where Scalar : TensorFlowScalar

extension Parameter: CopyableToDevice


                    
                    
                    TFETensorHandle

Class wrapping a C pointer to a TensorHandle. This class owns the TensorHandle and is responsible for destroying it.

Declaration

public class TFETensorHandle : _AnyTensorHandle

extension TFETensorHandle: Equatable


                    
                    
                    RMSProp

A RMSProp optimizer.

Implements the RMSProp optimization algorithm. RMSProp is a form of stochastic gradient descent where the gradients are divided by a running average of their recent magnitude. RMSProp keeps a moving average of the squared gradient for each weight.

References:

“Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude” (Tieleman and Hinton, 2012)
“Generating Sequences With Recurrent Neural Networks” (Graves, 2013)

Declaration

public class RMSProp<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative
    & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    AdaGrad

An AdaGrad optimizer.

Implements the AdaGrad (adaptive gradient) optimization algorithm. AdaGrad has parameter-specific learning rates, which are adapted relative to how frequently parameters gets updated during training. Parameters that receive more updates have smaller learning rates.

AdaGrad individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the running sum of squares of gradient norms.

Reference: “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization” (Duchi et al, 2011)

Declaration

public class AdaGrad<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative
    & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    AdaDelta

An AdaDelta optimizer.

Implements the AdaDelta optimization algorithm. AdaDelta is a stochastic gradient descent method based on the first order information. It adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. Thus, AdaDelta continues learning even when many updates have been done. It adapts faster to changing dynamics of the optimization problem space.

Reference: “ADADELTA: An Adaptive Learning Rate Method” (Zeiler, 2012)

Declaration

public class AdaDelta<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative
    & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    Adam

Adam optimizer.

Implements the Adam optimization algorithm. Adam is a stochastic gradient descent method that computes individual adaptive learning rates for different parameters from estimates of first- and second-order moments of the gradients.

Reference: “Adam: A Method for Stochastic Optimization” (Kingma and Ba, 2014).

Examples:

Train a simple reinforcement learning agent:

...
// Instantiate an agent's policy - approximated by the neural network (`net`) after defining it 
in advance.
var net = Net(observationSize: Int(observationSize), hiddenSize: hiddenSize, actionCount: actionCount)
// Define the Adam optimizer for the network with a learning rate set to 0.01.
let optimizer = Adam(for: net, learningRate: 0.01)
...
// Begin training the agent (over a certain number of episodes).
while true {
...
    // Implementing the gradient descent with the Adam optimizer:
    // Define the gradients (use withLearningPhase to call a closure under a learning phase).
    let gradients = withLearningPhase(.training) {
        TensorFlow.gradient(at: net) { net -> Tensor<Float> in
            // Return a softmax (loss) function
            return loss = softmaxCrossEntropy(logits: net(input), probabilities: target)
        }
    }
    // Update the differentiable variables of the network (`net`) along the gradients with the Adam 
optimizer.
    optimizer.update(&net, along: gradients)
    ...
    }
}

Train a generative adversarial network (GAN):

...
// Instantiate the generator and the discriminator networks after defining them.
var generator = Generator()
var discriminator = Discriminator()
// Define the Adam optimizers for each network with a learning rate set to 2e-4 and beta1 - to 0.5.
let adamOptimizerG = Adam(for: generator, learningRate: 2e-4, beta1: 0.5)
let adamOptimizerD = Adam(for: discriminator, learningRate: 2e-4, beta1: 0.5)
...
Start the training loop over a certain number of epochs (`epochCount`).
for epoch in 1...epochCount {
    // Start the training phase.
    ...
    for batch in trainingShuffled.batched(batchSize) {
        // Implementing the gradient descent with the Adam optimizer:
        // 1) Update the generator.
        ...
        let 𝛁generator = TensorFlow.gradient(at: generator) { generator -> Tensor<Float> in
            ...
            return loss
            }
        // Update the differentiable variables of the generator along the gradients (`𝛁generator`) 
        // with the Adam optimizer.
        adamOptimizerG.update(&generator, along: 𝛁generator)

        // 2) Update the discriminator.
        ...
        let 𝛁discriminator = TensorFlow.gradient(at: discriminator) { discriminator -> Tensor<Float> in
            ...
            return loss
        }
        // Update the differentiable variables of the discriminator along the gradients (`𝛁discriminator`) 
        // with the Adam optimizer.
        adamOptimizerD.update(&discriminator, along: 𝛁discriminator)
        }
}

Declaration

public class Adam<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative
    & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    AdaMax

AdaMax optimizer.

A variant of Adam based on the infinity-norm.

Reference: Section 7 of “Adam - A Method for Stochastic Optimization”

Declaration

public class AdaMax<Model: Differentiable & KeyPathIterable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
    & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    AMSGrad

AMSGrad optimizer.

This algorithm is a modification of Adam with better convergence properties when close to local optima.

Reference: “On the Convergence of Adam and Beyond”

Declaration

public class AMSGrad<Model: Differentiable & KeyPathIterable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
    & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    RAdam

RAdam optimizer.

Rectified Adam, a variant of Adam that introduces a term to rectify the adaptive learning rate variance.

Reference: “On the Variance of the Adaptive Learning Rate and Beyond”

Declaration

public class RAdam<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
    & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    NonuniformTrainingEpochs

An infinite sequence of collections of sample batches suitable for training a DNN when samples are not uniformly sized.

The batches in each epoch:

all have exactly the same number of samples.
are formed from samples of similar size.
start with a batch whose maximum sample size is the maximum size over all samples used in the epoch.

Declaration

public final class NonuniformTrainingEpochs<
  Samples: Collection,
  Entropy: RandomNumberGenerator
>: Sequence, IteratorProtocol


                    
                    
                    GeneralOptimizer

General optimizer that should be able to express multiple possible optimizations. The optimizer is composed of a mapping from ParameterGroup to ParameterGroupOptimizer. This optimizer also contains the number of elements working in a cross replica sum. This is for efficiency to prevent multiple inefficient iterations over the gradient.

Declaration

public class GeneralOptimizer<Model: EuclideanDifferentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float

SGD

A stochastic gradient descent (SGD) optimizer.

Implements the stochastic gradient descent algorithm with support for momentum, learning rate decay, and Nesterov momentum. Momentum and Nesterov momentum (a.k.a. the Nesterov accelerated gradient method) are first-order optimization methods that can improve the training speed and convergence rate of gradient descent.

References:

“A Stochastic Approximation Method” (Robbins and Monro, 1951)
“On the Stochastic Approximation Method of Robbins and Monro” (Wolfowitz, 1952)
“Stochastic Estimation of the Maximum of a Regression Function” (Kiefer and Wolfowitz, 1952)
“Some methods of speeding up the convergence of iteration method” (Polyak, 1964)
“A method for unconstrained convex minimization problem with the rate of convergence” (Nesterov, 1983)

Declaration

public class SGD<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float


                    
                    
                    TrainingEpochs

An infinite sequence of collections of batch samples suitable for training a DNN when samples are uniform.

The batches in each epoch all have exactly the same size.

Declaration

public final class TrainingEpochs<
  Samples: Collection,
  Entropy: RandomNumberGenerator
>: Sequence, IteratorProtocol


                    
                    
                    EpochPipelineQueue

Declaration

public class EpochPipelineQueue


                    
                    
                    ThreadState

The state of a training loop on a device.

Declaration

public class ThreadState<Model: Layer, Opt: Optimizer>
where
  Opt.Model == Model, Opt.Scalar == Float, Model.Input == Tensor<Float>,
  Model.Output == Tensor<Float>,
  Model.TangentVector.VectorSpaceScalar == Float