State Space Representation and Phase Analysis of Gradient Descent Optimizers.

Biyuan Yao,Guiqing Li,Wei Wu
DOI: https://doi.org/10.1007/s11432-022-3539-8
2023-01-01
Abstract:Deep learning has achieved good results in the field of image recognition due to the key role of the optimizer in a deep learning network. In this work, the optimizers of dynamical system models are established, and the influence of parameter adjustments on the dynamic performance of the system is proposed. This is a useful supplement to the theoretical control models of optimizers. First, the system control model is derived based on the iterative formula of the optimizer, the optimizer model is expressed by differential equations, and the control equation of the optimizer is established. Second, based on the system control model of the optimizer, the phase trajectory process of the optimizer model and the influence of different hyperparameters on the system performance of the learning model are analyzed. Finally, controllers with different optimizers and different hyperparameters are used to classify the MNIST and CIFAR-10 datasets to verify the effects of different optimizers on the model learning performance and compare them with related methods. Experimental results show that selecting appropriate optimizers can accelerate the convergence speed of the model and improve the accuracy of model recognition. Furthermore, the convergence speed and performance of the stochastic gradient descent (SGD) optimizer are better than those of the stochastic gradient descent-momentum (SGD-M) and Nesterov accelerated gradient (NAG) optimizers.
What problem does this paper attempt to address?