ABNGrad: adaptive step size gradient descent for optimizing neural networks
Wenhan Jiang,Yuqing Liang,Zhixia Jiang,Dongpo Xu,Linhua Zhou
DOI: https://doi.org/10.1007/s10489-024-05303-6
IF: 5.3
2024-02-17
Applied Intelligence
Abstract:Stochastic adaptive gradient decent algorithms, such as AdaGrad and Adam, are extensively used to train deep neural networks. However, randomly sampling gradient information introduces instability to the learning rates, leading to adaptive methods with poor generalization. To address this issue, the ABNGrad algorithm, which leverages the absolute value operation and the normalization technique, is proposed. More specifically, the absolute value function is first incorporated into the iteration of the second-order moment estimate to ensure that it monotonically increases. Then, the normalization technique is employed to prevent a rapid decrease in the learning rate. In particular, the techniques used in this paper can also be integrated into other existing adaptive algorithms, such as Adam, AdamW, AdaBound, and RAdam, yielding good performance. Additionally, it is shown that ABNGrad can attain the optimal regret bound for solving online convex optimization problems. Finally, many experimental results illustrate the effectiveness of ABNGrad. For a comprehensive exploration of the advantages of the proposed approach and the specifics of its detailed implementation, the readers are referred to the following https://github.com/Wenhan-Jiang/ABNGrad.git
computer science, artificial intelligence