Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Yuhan Ma,Dan Sun,Erdi Gao,Ningjing Sang,Iris Li,Guanming Huang
DOI: https://doi.org/10.48550/arXiv.2409.04707
2024-09-07
Abstract:Optimization theory serves as a pivotal scientific instrument for achieving optimal system performance, with its origins in economic applications to identify the best investment strategies for maximizing benefits. Over the centuries, from the geometric inquiries of ancient Greece to the calculus contributions by Newton and Leibniz, optimization theory has significantly advanced. The persistent work of scientists like Lagrange, Cauchy, and von Neumann has fortified its progress. The modern era has seen an unprecedented expansion of optimization theory applications, particularly with the growth of computer science, enabling more sophisticated computational practices and widespread utilization across engineering, decision analysis, and operations research. This paper delves into the profound relationship between optimization theory and deep learning, highlighting the omnipresence of optimization problems in the latter. We explore the gradient descent algorithm and its variants, which are the cornerstone of optimizing neural networks. The chapter introduces an enhancement to the SGD optimizer, drawing inspiration from numerical optimization methods, aiming to enhance interpretability and accuracy. Our experiments on diverse deep learning tasks substantiate the improved algorithm's efficacy. The paper concludes by emphasizing the continuous development of optimization theory and its expanding role in solving intricate problems, enhancing computational capabilities, and informing better policy decisions.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to improve the performance of deep - learning models by optimizing the gradient descent algorithm. Specifically, the paper explores the profound connection between optimization theory and deep learning, especially the central role of the gradient descent algorithm and its variants in neural network training. The author proposes an improved method for the Stochastic Gradient Descent (SGD) optimizer, which is inspired by numerical optimization techniques and aims to improve the interpretability and accuracy of the model. Experimental results show that this improved optimization algorithm exhibits higher efficiency in a variety of deep - learning tasks. The key to the paper is to use numerical methods, such as the Taylor multi - step method, to improve the traditional SGD algorithm, thereby proposing a new optimizer - TM - SGD. Through experimental verification on multiple datasets, it is proved that TM - SGD is superior to traditional SGD and other optimizers in multiple tasks such as image classification, object detection, segmentation, facial key point detection, image generation, text classification, and click - through rate prediction. In conclusion, the problem that this paper attempts to solve is how to improve the existing deep - learning optimization algorithms by combining numerical optimization methods in order to achieve the purpose of improving model performance.