Abstract:Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of $n$ components, HSDMPG attains an $\epsilon$ -optimization-error within $\mathcal {O} \left(\kappa \log ^{\zeta +1}\left(\frac{1}{\epsilon }\right)\frac{1}{\epsilon }\bigwedge n\log ^{\zeta }\left(\frac{1}{\epsilon }\right)\right)$ stochastic gradient evaluations, where $\kappa$ is condition number, $\zeta =1$ for quadratic loss and $\zeta =2$ for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when $\epsilon =\mathcal {O}(1/\sqrt{n})$ which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively $\mathcal {O} (n^{0.5}\log ^{2}(n))$ and $\mathcal {O} (n^{0.5}\log ^{3}(n))$ , which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG .

Proximal Gradient Method with Automatic Selection of the Parameter by Automatic Differentiation.

Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems

Parameterized proximal-gradient algorithms for L1/L2 sparse signal recovery

Linear Convergence of Inexact Descent Method and Inexact Proximal Gradient Algorithms for Lower-Order Regularization Problems

A Proximal-Gradient Method for Constrained Optimization

A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems

Variable Metric Proximal Gradient Method with Diagonal Barzilai-Borwein Stepsize

General Parameterized Proximal Point Algorithm with Applications in Statistical Learning

Bregman Proximal Gradient Algorithm with Extrapolation for a Class of Nonconvex Nonsmooth Minimization Problems

A class of modified accelerated proximal gradient methods for nonsmooth and nonconvex minimization problems

Nonconvex Stochastic Bregman Proximal Gradient Method for Nonconvex Composite Problems

MGProx: A nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization

Accelerated Proximal Gradient Methods for Nonconvex Programming

A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization.

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization.

Proximal methods for structured nonsmooth optimization over Riemannian submanifolds

Scaled Proximal Gradient Methods for Sparse Optimization Problems

Smoothing composite proximal gradient algorithm for sparse group Lasso problems with nonsmooth loss functions

Adaptive Proximal Gradient Method for Convex Optimization

A proximal-gradient method for problems with overlapping group-sparse regularization: support identification complexity