Abstract:Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to reinterpret the Adam optimizer from the perspective of online learning and to explore the importance of various components (such as momentum and decay factors) within the Adam optimizer through this lens. #### Main Research Content 1. **Online Learning of Updates (OLU)**: - The paper introduces the Online Learning of Updates (OLU) framework, which transforms the design of optimization algorithms into the selection of a good online learner. Through this framework, the choice of optimizer is simplified to selecting an online learner that performs well in dynamic environments. 2. **Relationship between Adam and FTRL**: - The paper finds that the Adam optimizer is actually a special type of Follow-the-Regularized-Leader (FTRL) online learning algorithm. Specifically, when using a discount factor, Adam can be seen as a discounted version of FTRL (β-FTRL). 3. **Dynamic Regret Analysis**: - Through dynamic regret analysis, the paper demonstrates that momentum and decay factors are crucial for designing well-performing dynamic online learners. Without these components, FTRL performs poorly in dynamic environments. #### Key Contributions 1. **Theoretical Analysis**: - Provides the equivalence relationship between the Adam optimizer and FTRL, and proves the superiority of the discounted version of FTRL (β-FTRL) in dynamic environments. 2. **Dynamic Regret Bounds**: - Presents the upper bounds of dynamic regret for β-FTRL in both unbounded and bounded domains, and demonstrates the importance of momentum and decay factors by comparing with baseline methods such as SGD and AdaGrad. Through the above work, the paper offers a new perspective for understanding the Adam optimizer and reveals the importance of its core components (momentum and decay factors) in dynamic environments.

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD

CAdam: Confidence-Based Optimization for Online Learning

UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization

Provable Adaptivity of Adam under Non-uniform Smoothness

UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization

Adam: A Method for Stochastic Optimization

A Novel Convergence Analysis for Algorithms of the Adam Family

A Randomized Block-Coordinate Adam online learning optimization algorithm

A modification of adaptive moment estimation (adam) for machine learning

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps

Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration.

Deconstructing What Makes a Good Optimizer for Language Models

Improving Adaptive Online Learning Using Refined Discretization

Convergence rates for the Adam optimizer

Continuous-Time Analysis of Adaptive Optimization and Normalization

HyperAdam: A Learnable Task-Adaptive Adam for Network Training

CaAdam: Improving Adam optimizer using connection aware methods

Discounted Adaptive Online Learning: Towards Better Regularization