Stochastic Gradient Descent Revisited

Azar Louzi
2024-12-09
Abstract:Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby present a full scope convergence study of biased nonconvex SGD, including weak convergence, function-value convergence and global convergence, and also provide subsequent convergence rates and complexities, all under relatively mild conditions in comparison with literature.
Optimization and Control,Probability,Machine Learning
What problem does this paper attempt to address?