A fast adaptive algorithm for training deep neural networks

Yangting Gui,Dequan Li,Runyue Fang
DOI: https://doi.org/10.1007/s10489-022-03629-7
IF: 5.3
2022-06-07
Applied Intelligence
Abstract:Among the adaptive algorithms, Adam is the most widely used algorithm, especially for training deep neural networks. However, recent studies have shown that it has a weak generalization ability, and even cannot converge in extreme cases. AdaX (2020) is a variant of Adam, which modifies the second moment of Adam, making the algorithm enjoy good generalization ability compared to SGD. This work aims to improve the AdaX algorithm with faster convergence speed and higher training accuracy. The first moment of AdaX is essentially a classical momentum term, while the Nesterov's accelerated gradient (NAG) is theoretically and experimentally superior to this classical momentum. Therefore, we replace the classical momentum term of the first moment of AdaX with NAG, and obtain the resulting algorithm named Nesterov's accelerated AdaX (Nadax). Extensive experiments on deep learning tasks show that training models with our proposed Nadax can bring favorable benefits.
computer science, artificial intelligence
What problem does this paper attempt to address?