tff.learning.optimizers.build_yogi
Stay organized with collections
Save and categorize content based on your preferences.
Returns a tff.learning.optimizers.Optimizer
for Yogi.
tff.learning.optimizers.build_yogi(
learning_rate: optimizer.Float,
beta_1: optimizer.Float = 0.9,
beta_2: optimizer.Float = 0.999,
epsilon: optimizer.Float = 0.001,
initial_preconditioner_value: optimizer.Float = 1e-06
) -> tff.learning.optimizers.Optimizer
The Yogi optimizer is based on Adaptive methods for nonconvex optimization
The update rule given learning rate lr
, epsilon eps
, accumulator acc
,
preconditioner s
, iteration t
, weights w
and gradients g
is:
acc = beta_1 * acc + (1 - beta_1) * g
s = s + (1 - beta_2) * sign(g - s) * (g ** 2)
normalized_lr = lr * sqrt(1 - beta_2**t) / (1 - beta_1**t)
w = w - normalized_lr * acc / (sqrt(s) + eps)
Implementation of Yogi is based on additive updates, as opposed to
multiplicative updates (as in Adam). Experiments show better performance
across NLP and Vision tasks both in centralized and federated settings.
Typically use 10x the learning rate used for Adam.
Args |
learning_rate
|
A positive float for learning rate.
|
beta_1
|
A float between 0.0 and 1.0 for the decay used to track the
previous gradients.
|
beta_2
|
A float between 0.0 and 1.0 for the decay used to track the
magnitude (second moment) of previous gradients.
|
epsilon
|
A constant trading off adaptivity and noise..
|
initial_preconditioner_value
|
The starting value for preconditioner. Only
positive values are allowed.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[null,null,["Last updated 2024-09-20 UTC."],[],[],null,["# tff.learning.optimizers.build_yogi\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/federated/blob/v0.87.0 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the) |\n\nReturns a [`tff.learning.optimizers.Optimizer`](../../../tff/learning/optimizers/Optimizer) for Yogi. \n\n tff.learning.optimizers.build_yogi(\n learning_rate: optimizer.Float,\n beta_1: optimizer.Float = 0.9,\n beta_2: optimizer.Float = 0.999,\n epsilon: optimizer.Float = 0.001,\n initial_preconditioner_value: optimizer.Float = 1e-06\n ) -\u003e ../../../tff/learning/optimizers/Optimizer\n\nThe Yogi optimizer is based on [Adaptive methods for nonconvex optimization](https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization.pdf)\n\nThe update rule given learning rate `lr`, epsilon `eps`, accumulator `acc`,\npreconditioner `s`, iteration `t`, weights `w` and gradients `g` is: \n\n acc = beta_1 * acc + (1 - beta_1) * g\n s = s + (1 - beta_2) * sign(g - s) * (g ** 2)\n normalized_lr = lr * sqrt(1 - beta_2**t) / (1 - beta_1**t)\n w = w - normalized_lr * acc / (sqrt(s) + eps)\n\nImplementation of Yogi is based on additive updates, as opposed to\nmultiplicative updates (as in Adam). Experiments show better performance\nacross NLP and Vision tasks both in centralized and federated settings.\n\nTypically use 10x the learning rate used for Adam.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------------|--------------------------------------------------------------------------------------------------------------------|\n| `learning_rate` | A positive `float` for learning rate. |\n| `beta_1` | A `float` between `0.0` and `1.0` for the decay used to track the previous gradients. |\n| `beta_2` | A `float` between `0.0` and `1.0` for the decay used to track the magnitude (second moment) of previous gradients. |\n| `epsilon` | A constant trading off adaptivity and noise.. |\n| `initial_preconditioner_value` | The starting value for preconditioner. Only positive values are allowed. |\n\n\u003cbr /\u003e"]]