The following classes are available globally.
-
A mutable, shareable, owning reference to a tensor.
Declaration
public final class Parameter<Scalar> where Scalar : TensorFlowScalar
extension Parameter: CopyableToDevice
-
Class wrapping a C pointer to a TensorHandle. This class owns the TensorHandle and is responsible for destroying it.
Declaration
public class TFETensorHandle : _AnyTensorHandle
extension TFETensorHandle: Equatable
-
A RMSProp optimizer.
Implements the RMSProp optimization algorithm. RMSProp is a form of stochastic gradient descent where the gradients are divided by a running average of their recent magnitude. RMSProp keeps a moving average of the squared gradient for each weight.
References:
- “Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude” (Tieleman and Hinton, 2012)
- “Generating Sequences With Recurrent Neural Networks” (Graves, 2013)
Declaration
public class RMSProp<Model: Differentiable>: Optimizer where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
An AdaGrad optimizer.
Implements the AdaGrad (adaptive gradient) optimization algorithm. AdaGrad has parameter-specific learning rates, which are adapted relative to how frequently parameters gets updated during training. Parameters that receive more updates have smaller learning rates.
AdaGrad individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the running sum of squares of gradient norms.
Reference: “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization” (Duchi et al, 2011)
Declaration
public class AdaGrad<Model: Differentiable>: Optimizer where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
An AdaDelta optimizer.
Implements the AdaDelta optimization algorithm. AdaDelta is a stochastic gradient descent method based on the first order information. It adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. Thus, AdaDelta continues learning even when many updates have been done. It adapts faster to changing dynamics of the optimization problem space.
Reference: “ADADELTA: An Adaptive Learning Rate Method” (Zeiler, 2012)
Declaration
public class AdaDelta<Model: Differentiable>: Optimizer where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
Adam optimizer.
Implements the Adam optimization algorithm. Adam is a stochastic gradient descent method that computes individual adaptive learning rates for different parameters from estimates of first- and second-order moments of the gradients.
Reference: “Adam: A Method for Stochastic Optimization” (Kingma and Ba, 2014).
Examples:
- Train a simple reinforcement learning agent:
... // Instantiate an agent's policy - approximated by the neural network (`net`) after defining it in advance. var net = Net(observationSize: Int(observationSize), hiddenSize: hiddenSize, actionCount: actionCount) // Define the Adam optimizer for the network with a learning rate set to 0.01. let optimizer = Adam(for: net, learningRate: 0.01) ... // Begin training the agent (over a certain number of episodes). while true { ... // Implementing the gradient descent with the Adam optimizer: // Define the gradients (use withLearningPhase to call a closure under a learning phase). let gradients = withLearningPhase(.training) { TensorFlow.gradient(at: net) { net -> Tensor<Float> in // Return a softmax (loss) function return loss = softmaxCrossEntropy(logits: net(input), probabilities: target) } } // Update the differentiable variables of the network (`net`) along the gradients with the Adam optimizer. optimizer.update(&net, along: gradients) ... } }
- Train a generative adversarial network (GAN):
... // Instantiate the generator and the discriminator networks after defining them. var generator = Generator() var discriminator = Discriminator() // Define the Adam optimizers for each network with a learning rate set to 2e-4 and beta1 - to 0.5. let adamOptimizerG = Adam(for: generator, learningRate: 2e-4, beta1: 0.5) let adamOptimizerD = Adam(for: discriminator, learningRate: 2e-4, beta1: 0.5) ... Start the training loop over a certain number of epochs (`epochCount`). for epoch in 1...epochCount { // Start the training phase. ... for batch in trainingShuffled.batched(batchSize) { // Implementing the gradient descent with the Adam optimizer: // 1) Update the generator. ... let 𝛁generator = TensorFlow.gradient(at: generator) { generator -> Tensor<Float> in ... return loss } // Update the differentiable variables of the generator along the gradients (`𝛁generator`) // with the Adam optimizer. adamOptimizerG.update(&generator, along: 𝛁generator) // 2) Update the discriminator. ... let 𝛁discriminator = TensorFlow.gradient(at: discriminator) { discriminator -> Tensor<Float> in ... return loss } // Update the differentiable variables of the discriminator along the gradients (`𝛁discriminator`) // with the Adam optimizer. adamOptimizerD.update(&discriminator, along: 𝛁discriminator) } }
Declaration
public class Adam<Model: Differentiable>: Optimizer where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
AdaMax optimizer.
A variant of Adam based on the infinity-norm.
Reference: Section 7 of “Adam - A Method for Stochastic Optimization”
Declaration
public class AdaMax<Model: Differentiable & KeyPathIterable>: Optimizer where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
AMSGrad optimizer.
This algorithm is a modification of Adam with better convergence properties when close to local optima.
Reference: “On the Convergence of Adam and Beyond”
Declaration
public class AMSGrad<Model: Differentiable & KeyPathIterable>: Optimizer where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
RAdam optimizer.
Rectified Adam, a variant of Adam that introduces a term to rectify the adaptive learning rate variance.
Reference: “On the Variance of the Adaptive Learning Rate and Beyond”
Declaration
public class RAdam<Model: Differentiable>: Optimizer where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
An infinite sequence of collections of sample batches suitable for training a DNN when samples are not uniformly sized.
The batches in each epoch:
- all have exactly the same number of samples.
- are formed from samples of similar size.
- start with a batch whose maximum sample size is the maximum size over all samples used in the epoch.
Declaration
public final class NonuniformTrainingEpochs< Samples: Collection, Entropy: RandomNumberGenerator >: Sequence, IteratorProtocol
-
General optimizer that should be able to express multiple possible optimizations. The optimizer is composed of a mapping from ParameterGroup to ParameterGroupOptimizer. This optimizer also contains the number of elements working in a cross replica sum. This is for efficiency to prevent multiple inefficient iterations over the gradient.
Declaration
public class GeneralOptimizer<Model: EuclideanDifferentiable>: Optimizer where Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
A stochastic gradient descent (SGD) optimizer.
Implements the stochastic gradient descent algorithm with support for momentum, learning rate decay, and Nesterov momentum. Momentum and Nesterov momentum (a.k.a. the Nesterov accelerated gradient method) are first-order optimization methods that can improve the training speed and convergence rate of gradient descent.
References:
- “A Stochastic Approximation Method” (Robbins and Monro, 1951)
- “On the Stochastic Approximation Method of Robbins and Monro” (Wolfowitz, 1952)
- “Stochastic Estimation of the Maximum of a Regression Function” (Kiefer and Wolfowitz, 1952)
- “Some methods of speeding up the convergence of iteration method” (Polyak, 1964)
- “A method for unconstrained convex minimization problem with the rate of convergence” (Nesterov, 1983)
Declaration
public class SGD<Model: Differentiable>: Optimizer where Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable, Model.TangentVector.VectorSpaceScalar == Float
-
An infinite sequence of collections of batch samples suitable for training a DNN when samples are uniform.
The batches in each epoch all have exactly the same size.
Declaration
public final class TrainingEpochs< Samples: Collection, Entropy: RandomNumberGenerator >: Sequence, IteratorProtocol
-
Declaration
public class EpochPipelineQueue