UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization

Yiming Jiang,Jinlan Liu,Dongpo Xu,Danilo P. Mandic
DOI: https://doi.org/10.1162/neco_a_01692
2023-05-09
Abstract:Adam-type algorithms have become a preferred choice for optimisation in the deep learning setting, however, despite success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms (called UAdam). This is equipped with a general form of the second-order moment, which makes it possible to include Adam and its variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. This is supported by a rigorous convergence analysis of UAdam in the non-convex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with the rate of $\mathcal{O}(1/T)$. Furthermore, the size of neighborhood decreases as $\beta$ increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also show that vanilla Adam can converge by selecting appropriate hyperparameters, which provides a theoretical guarantee for the analysis, applications, and further developments of the whole class of Adam-type algorithms.
Machine Learning,Numerical Analysis,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is related to the convergence of Adam and its variant algorithms in non - convex stochastic optimization. Although the Adam algorithm has achieved remarkable success in deep learning, its convergence is still not fully understood. Specifically: 1. **Convergence problem of Adam algorithm**: Although the Adam algorithm performs well in practice, Reddi et al. pointed out that Adam may diverge on simple convex problems. Therefore, researchers have proposed many variants of Adam (such as AMSGrad, AdaBound, etc.). These variants only differ in the second - order moment, but lack a unified theoretical framework to explain their convergence. 2. **Limitations of existing analysis**: Most previous work mainly focused on theoretical analysis in the online convex setting and could not explain the convergence behavior in the non - convex setting common in practical applications. In addition, many analyses have strict requirements for the second - order momentum parameter \(\beta_2\), which is inconsistent with the hyperparameter settings in practical applications. To solve these problems, this paper proposes a unified Adam - type algorithm framework (UAdam), aiming at: - **Providing a general framework**: UAdam can include Adam and its various variants as special cases, thereby providing a unified theoretical analysis platform for these algorithms. - **Relaxing the restrictions on the second - order momentum parameter**: The paper proves that UAdam can converge without imposing any restrictions on the second - order momentum parameter \(\beta_2\), as long as the first - order momentum parameter \(\beta_1\) is close to 1. - **Proving the convergence rate**: The paper proves that UAdam converges to the neighborhood of the stable point at a rate of \(O(1/T)\) in the non - convex stochastic optimization setting, and as \(\beta\) increases, the size of the neighborhood will decrease. Through these contributions, this paper not only provides a new perspective for understanding the convergence of Adam and its variants, but also provides a theoretical basis for the development of new optimization algorithms in the future.