A Mini-Batch Stochastic Conjugate Gradient Algorithm with Variance Reduction.

Kou Caixia,Yang Han
DOI: https://doi.org/10.1007/s10898-022-01205-4
2022-01-01
Journal of Global Optimization
Abstract:Stochastic gradient descent method is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, there have been many explicit variance reduction methods for stochastic descent, such as SVRG Johnson and Zhang [Advances in neural information processing systems, (2013), pp. 315–323], SAG Roux et al. [Advances in neural information processing systems, (2012), pp. 2663–2671], SAGA Defazio et al. [Advances in neural information processing systems, (2014), pp. 1646–1654] and so on. Conjugate gradient method, which has the same computation cost with gradient descent method, is considered. In this paper, in the spirit of SAGA, we propose a stochastic conjugate gradient algorithm which we call SCGA. With the Fletcher and Reeves type choices, we prove a linear convergence rate for smooth and strongly convex functions. We experimentally demonstrate that SCGA converges faster than the popular SGD type algorithms for four machine learning models, which may be convex, nonconvex or nonsmooth. Solving regression problems, SCGA is competitive with CGVR, which is the only one stochastic conjugate gradient algorithm with variance reduction so far, as we know.
What problem does this paper attempt to address?