tf.compat.v1.tpu.experimental.AdamParameters
Optimization parameters for Adam with TPU embeddings.
tf.compat.v1.tpu.experimental.AdamParameters(
learning_rate: float,
beta1: float = 0.9,
beta2: float = 0.999,
epsilon: float = 1e-08,
lazy_adam: bool = True,
sum_inside_sqrt: bool = True,
use_gradient_accumulation: bool = True,
clip_weight_min: Optional[float] = None,
clip_weight_max: Optional[float] = None,
weight_decay_factor: Optional[float] = None,
multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None,
clip_gradient_min: Optional[float] = None,
clip_gradient_max: Optional[float] = None
)
Pass this to tf.estimator.tpu.experimental.EmbeddingConfigSpec
via the
optimization_parameters
argument to set the optimizer and its parameters.
See the documentation for tf.estimator.tpu.experimental.EmbeddingConfigSpec
for more details.
estimator = tf.estimator.tpu.TPUEstimator(
...
embedding_config_spec=tf.estimator.tpu.experimental.EmbeddingConfigSpec(
...
optimization_parameters=tf.tpu.experimental.AdamParameters(0.1),
...))
Args |
learning_rate
|
a floating point value. The learning rate.
|
beta1
|
A float value. The exponential decay rate for the 1st moment
estimates.
|
beta2
|
A float value. The exponential decay rate for the 2nd moment
estimates.
|
epsilon
|
A small constant for numerical stability.
|
lazy_adam
|
Use lazy Adam instead of Adam. Lazy Adam trains faster. See
optimization_parameters.proto for details.
|
sum_inside_sqrt
|
This improves training speed. Please see
optimization_parameters.proto for details.
|
use_gradient_accumulation
|
setting this to False makes embedding
gradients calculation less accurate but faster. Please see
optimization_parameters.proto for details.
|
clip_weight_min
|
the minimum value to clip by; None means -infinity.
|
clip_weight_max
|
the maximum value to clip by; None means +infinity.
|
weight_decay_factor
|
amount of weight decay to apply; None means that the
weights are not decayed.
|
multiply_weight_decay_factor_by_learning_rate
|
if true,
weight_decay_factor is multiplied by the current learning rate.
|
clip_gradient_min
|
the minimum value to clip by; None means -infinity.
Gradient accumulation must be set to true if this is set.
|
clip_gradient_max
|
the maximum value to clip by; None means +infinity.
Gradient accumulation must be set to true if this is set.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2022-11-04 UTC.
[null,null,["Last updated 2022-11-04 UTC."],[],[]]