Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

Kwangjun Ahn,Zhiyu Zhang,Yunbum Kook,Yan Dai
2024-05-30
Abstract:Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to reinterpret the Adam optimizer from the perspective of online learning and to explore the importance of various components (such as momentum and decay factors) within the Adam optimizer through this lens. #### Main Research Content 1. **Online Learning of Updates (OLU)**: - The paper introduces the Online Learning of Updates (OLU) framework, which transforms the design of optimization algorithms into the selection of a good online learner. Through this framework, the choice of optimizer is simplified to selecting an online learner that performs well in dynamic environments. 2. **Relationship between Adam and FTRL**: - The paper finds that the Adam optimizer is actually a special type of Follow-the-Regularized-Leader (FTRL) online learning algorithm. Specifically, when using a discount factor, Adam can be seen as a discounted version of FTRL (β-FTRL). 3. **Dynamic Regret Analysis**: - Through dynamic regret analysis, the paper demonstrates that momentum and decay factors are crucial for designing well-performing dynamic online learners. Without these components, FTRL performs poorly in dynamic environments. #### Key Contributions 1. **Theoretical Analysis**: - Provides the equivalence relationship between the Adam optimizer and FTRL, and proves the superiority of the discounted version of FTRL (β-FTRL) in dynamic environments. 2. **Dynamic Regret Bounds**: - Presents the upper bounds of dynamic regret for β-FTRL in both unbounded and bounded domains, and demonstrates the importance of momentum and decay factors by comparing with baseline methods such as SGD and AdaGrad. Through the above work, the paper offers a new perspective for understanding the Adam optimizer and reveals the importance of its core components (momentum and decay factors) in dynamic environments.