|View source on GitHub|
Adam optimizer with weight decay that exactly matches the original BERT.
class AdamWeightDecay: Adam enables L2 weight decay and clip_by_global_norm on gradients.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2022-09-23 UTC.