Warning: This project is deprecated. TensorFlow Addons has stopped development,
The project will only be providing minimal maintenance releases until May 2024. See the full
announcement here or on
github.
Returns an optimizer class. An instance of the returned class computes the
update step of base_optimizer and additionally decays the weights.
E.g., the class returned by
extend_with_decoupled_weight_decay(tf.keras.optimizers.Adam) is
equivalent to tfa.optimizers.AdamW.
The API of the new optimizer class slightly differs from the API of the
base optimizer:
The first argument to the constructor is the weight decay rate.
Optional keyword argument exclude_from_weight_decay accepts list of
regex patterns of variables excluded from weight decay. Variables whose
name contain a substring matching the pattern will be excluded.
minimize and apply_gradients accept the optional keyword argument
decay_var_list, which specifies the variables that should be decayed.
Note this takes priority over exclude_from_weight_decay if specified.
If both None, all variables that are optimized are decayed.
Usage example:
# MyAdamW is a new classMyAdamW=extend_with_decoupled_weight_decay(tf.keras.optimizers.Adam)# Create a MyAdamW objectoptimizer=MyAdamW(weight_decay=0.001,learning_rate=0.001)# update var1, var2 but only decay var1optimizer.minimize(loss,var_list=[var1,var2],decay_variables=[var1])Note:thisextensiondecaysweightsBEFOREapplyingtheupdatebasedonthegradient,i.e.thisextensiononlyhasthedesiredbehaviourforoptimizerswhichdonotdependonthevalueof'var'intheupdatestep!Note:whenapplyingadecaytothelearningrate,besuretomanuallyapplythedecaytothe`weight_decay`aswell.Forexample:```pythonstep=tf.Variable(0,trainable=False)schedule=tf.optimizers.schedules.PiecewiseConstantDecay([10000,15000],[1e-0,1e-1,1e-2])# lr and wd can be a function or a tensorlr=1e-1*schedule(step)wd=lambda:1e-4*schedule(step)# ...optimizer=tfa.optimizers.AdamW(learning_rate=lr,weight_decay=wd)
Args
base_optimizer
An optimizer class that inherits from
tf.optimizers.Optimizer.
Returns
A new optimizer class that inherits from DecoupledWeightDecayExtension
and base_optimizer.
[null,null,["Last updated 2023-05-25 UTC."],[],[],null,["# tfa.optimizers.extend_with_decoupled_weight_decay\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/addons/blob/v0.20.0/tensorflow_addons/optimizers/weight_decay_optimizers.py#L279-L371) |\n\nFactory function returning an optimizer class with decoupled weight decay. \n\n tfa.optimizers.extend_with_decoupled_weight_decay(\n base_optimizer: Type[keras_legacy_optimizer]\n ) -\u003e Type[keras_legacy_optimizer]\n\nReturns an optimizer class. An instance of the returned class computes the\nupdate step of `base_optimizer` and additionally decays the weights.\nE.g., the class returned by\n`extend_with_decoupled_weight_decay(tf.keras.optimizers.Adam)` is\nequivalent to [`tfa.optimizers.AdamW`](../../tfa/optimizers/AdamW).\n\nThe API of the new optimizer class slightly differs from the API of the\nbase optimizer:\n\n- The first argument to the constructor is the weight decay rate.\n- Optional keyword argument `exclude_from_weight_decay` accepts list of regex patterns of variables excluded from weight decay. Variables whose name contain a substring matching the pattern will be excluded.\n- `minimize` and `apply_gradients` accept the optional keyword argument `decay_var_list`, which specifies the variables that should be decayed. Note this takes priority over `exclude_from_weight_decay` if specified. If both `None`, all variables that are optimized are decayed.\n\n#### Usage example:\n\n # MyAdamW is a new class\n MyAdamW = extend_with_decoupled_weight_decay(tf.keras.optimizers.Adam)\n # Create a MyAdamW object\n optimizer = MyAdamW(weight_decay=0.001, learning_rate=0.001)\n # update var1, var2 but only decay var1\n optimizer.minimize(loss, var_list=[var1, var2], decay_variables=[var1])\n\n Note: this extension decays weights BEFORE applying the update based\n on the gradient, i.e. this extension only has the desired behaviour for\n optimizers which do not depend on the value of 'var' in the update step!\n\n Note: when applying a decay to the learning rate, be sure to manually apply\n the decay to the `weight_decay` as well. For example:\n\n ```python\n step = tf.Variable(0, trainable=False)\n schedule = tf.optimizers.schedules.PiecewiseConstantDecay(\n [10000, 15000], [1e-0, 1e-1, 1e-2])\n # lr and wd can be a function or a tensor\n lr = 1e-1 * schedule(step)\n wd = lambda: 1e-4 * schedule(step)\n\n # ...\n\n optimizer = tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)\n\n| **Note:** you might want to register your own custom optimizer using `tf.keras.utils.get_custom_objects()`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------|----------------------------------------------------------------|\n| `base_optimizer` | An optimizer class that inherits from tf.optimizers.Optimizer. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A new optimizer class that inherits from DecoupledWeightDecayExtension and base_optimizer. ||\n\n\u003cbr /\u003e"]]