Abstract:Stochastic Gradient Descent (SGD) series optimization methods play the vital role in training neural networks, attracting growing attention in science and engineering fields of the intelligent system. The choice of learning rates affects the convergence rate of SGD series optimization methods. Currently, learning rate adjustment strategies mainly face the following problems: (1) The traditional learning rate decay method mainly adopts manual manner during training iterations, the small learning rate produced from which causes slow convergence in training neural networks. (2) Adaptive method (e.g., Adam) has poor generalization performance. To alleviate the above issues, we propose a novel automatic learning rate decay strategy for SGD optimization methods in neural networks. On the basis of the observation that the convergence rate's upper bound enjoys minimization in a specific iteration concerning the current learning rate, we first present the expression of the current learning rate determined by historical learning rates. And merely one extra parameter is initialized to generate automatic decreasing learning rates during the training process. Our proposed approach is applied to SGD and Momentum SGD optimization algorithms, and concrete theoretical proof explains its convergence. Numerical simulations are conducted on the MNIST and Cifar‐10 data sets with different neural networks. Experimental results show that our algorithm outperforms existing classical ones, achieving faster convergence rate, better stability, and generalization performance in neural network training. It also lays a foundation for large‐scale parallel search of initial parameters in intelligent systems.

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

Gradient Descent: The Ultimate Optimizer

Gradient Descent, Stochastic Optimization, and Other Tales

An overview of gradient descent optimization algorithms

Optimization Methods in Deep Learning: A Comprehensive Overview

An Efficient Optimization Technique for Training Deep Neural Networks

Optimization for deep learning: theory and algorithms

Learning Gradient Descent: Better Generalization and Longer Horizons

Gradient Descent Optimization in Deep Learning Model Training Based on Multistage and Method Combination Strategy

Automatic Gradient Descent: Deep Learning without Hyperparameters

Learning by Turning: Neural Architecture Aware Optimisation

A survey of deep learning optimizers -- first and second order methods

An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

Deep Genetic Network

Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant

Gradient Descent Finds Global Minima of Deep Neural Networks.

No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths