Unbiased Quasi-Hyperbolic Nesterov-Gradient Momentum-Based Optimizers for Accelerating Convergence

Weiwei Cheng,Xiaochun Yang,Bin Wang,Wei Wang
DOI: https://doi.org/10.1007/s11280-022-01086-3
2022-01-01
World Wide Web
Abstract:In the training process of deep learning models, one of the important steps is to choose an appropriate optimizer that directly determines the final performance of the model. Choosing the appropriate direction and step size (i.e. learning rate) of parameter update are decisive factors for optimizers. Previous gradient descent optimizers could be oscillated and fail to converge to a minimum point because they are only sensitive to the current gradient. Momentum-Based Optimizers (MBOs) have been widely adopted recently since they can relieve oscillation to accelerate convergence by using the exponentially decaying average of gradients to fine-tune the direction. However, we find that most of the existing MBOs are biased and inconsistent with the local fastest descent direction resulting in a high number of iterations. To accelerate convergence, we propose an Unbiased strategy to adjust the descent direction of a variety of MBOs. We further propose an Unbiased Quasi-hyperbolic Nesterov-gradient strategy (UQN) by combining our Unbiased strategy with the existing Quasi-hyperbolic and Nesterov-gradient. It makes each update step move in the local fastest descent direction, predicts the future gradient to avoid crossing the minimum point, and reduces gradient variance. We extend our strategies to multiple MBOs and prove the convergence of our strategies. The main experimental results presented in this paper are based on popular neural network models and benchmark datasets. The experimental results demonstrate the effectiveness and universality of our proposed strategies.
What problem does this paper attempt to address?