Abstract:This paper describes a novel algorithmic framework to minimize a finite-sum of functions available over a network of nodes. The proposed framework, that we call~\GTVR, is stochastic and decentralized, and thus is particularly suitable for problems where large-scale, potentially private data, cannot be collected or processed at a centralized server. The \GTVR~framework leads to a family of algorithms with two key ingredients: (i) \textit{local variance reduction}, that enables estimating the local batch gradients from arbitrarily drawn samples of local data; and, (ii) \textit{global gradient tracking}, which fuses the gradient information across the nodes. Naturally, combining different variance reduction and gradient tracking techniques leads to different algorithms of interest with valuable practical tradeoffs and design considerations. Our focus in this paper is on two instantiations of the~$\GTVR$ framework, namely~\textbf{\texttt{GT-SAGA}} and~\textbf{\texttt{GT-SVRG}}, that, similar to their centralized counterparts (\SAGA~and~\SVRG), exhibit a compromise between space and time. We show that both~\textbf{\texttt{GT-SAGA}} and~\textbf{\texttt{GT-SVRG}} achieve accelerated linear convergence for smooth and strongly convex problems and further describe the regimes in which they achieve non-asymptotic, network-independent linear convergence rates that are faster with respect to the existing decentralized first-order schemes. Moreover, we show that both algorithms achieve a linear speedup in such regimes, in that, the total number of gradient computations required at each node is reduced by a factor of $1/n$, where $n$ is the number of nodes, compared to their centralized counterparts that process all data at a single node. Extensive simulations illustrate the convergence behavior of the corresponding algorithms.

Trading-off variance and complexity in stochastic gradient descent

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

Analysis of the Variance Reduction in SVRG and a New Acceleration Method.

Stochastic Sub-Sampled Newton Method with Variance Reduction

Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking -- Part II: GT-SVRG

Stochastic Gradient Descent with Variance Reduction Technique

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

A stochastic variance reduced gradient method with adaptive step for stochastic optimization

Adaptive Variance Reducing for Stochastic Gradient Descent.

On the Bias-Variance Tradeoff in Stochastic Gradient Methods

Variance-Reduced Decentralized Stochastic Optimization with Accelerated Convergence

Variance reduction techniques for stochastic proximal point algorithms

Stochastic Nested Variance Reduction for Nonconvex Optimization

Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning

N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples

Fvr-Sgd: A New Flexible Variance-Reduction Method For Sgd On Large-Scale Datasets

Distributed Stochastic Gradient Tracking Algorithm with Variance Reduction for Non-Convex Optimization

A stochastic gradient method with variance control and variable learning rate for Deep Learning

Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

A Coefficient Makes SVRG Effective