Abstract:In this paper, we propose a simple variant of the original stochastic variance reduction gradient (SVRG), where hereafter we refer to as the variance reduced stochastic gradient descent (VR-SGD). Different from the choices of the snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG, the two vectors of each epoch in VR-SGD are set to the average and last iterate of the previous epoch, respectively. This setting allows us to use much larger learning rates or step sizes than SVRG, e.g., 3/(7L) for VR-SGD vs 1/(10L) for SVRG, and also makes our convergence analysis more challenging. In fact, a larger learning rate enjoyed by VR-SGD means that the variance of its stochastic gradient estimator asymptotically approaches zero more rapidly. Unlike common stochastic methods such as SVRG and proximal stochastic methods such as Prox-SVRG, we design two different update rules for smooth and non-smooth objective functions, respectively. In other words, VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without using any reduction techniques such as quadratic regularizers. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains a linear convergence rate. We also provide the convergence guarantees of VR-SGD for non-strongly convex problems. Experimental results show that the performance of VR-SGD is significantly better than its counterparts, SVRG and Prox-SVRG, and it is also much better than the best known stochastic method, Katyusha.

IS-ASGD

Asynchronous Accelerated Stochastic Gradient Descent.

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Accelerated Stochastic ADMM with Variance Reduction

Fvr-Sgd: A New Flexible Variance-Reduction Method For Sgd On Large-Scale Datasets

Fast Asynchronous Parallel Stochastic Gradient Decent

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach With Convergence Guarantee

Adaptive Variance Reducing for Stochastic Gradient Descent.

A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates.

Convergence Analysis of Asynchronous Stochastic Recursive Gradient Algorithms

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

Analysis of the Variance Reduction in SVRG and a New Acceleration Method.

ASVRG: Accelerated Proximal SVRG.

Accelerating SGD Using Flexible Variance Reduction on Large-Scale Datasets

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

SGD-rα: A Real-Time Α-Suffix Averaging Method for SGD with Biased Gradient Estimates

Asynchronous Parallel, Sparse Approximated SVRG for High-Dimensional Machine Learning

SVRG with Adaptive Epoch Size.

Accelerated Variance Reduced Stochastic ADMM

A Sharp Convergence Rate for the Asynchronous Stochastic Gradient Descent

Kill a Bird with Two Stones: Closing the Convergence Gaps in Non-Strongly Convex Optimization by Directly Accelerated SVRG with Double Compensation and Snapshots.