Abstract:SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee. In particular, we show that proximal SpiderBoost achieves an oracle complexity of $\mathcal{O}(\min\{n^{1/2}\epsilon^{-2},\epsilon^{-3}\})$ in composite nonconvex optimization, improving the state-of-the-art result by a factor of $\mathcal{O}(\min\{n^{1/6},\epsilon^{-1/3}\})$. We further develop a novel momentum scheme to accelerate SpiderBoost for composite optimization, which achieves the near-optimal oracle complexity in theory and substantial improvement in experiments.

What problem does this paper attempt to address?

This paper aims to solve the following problems: 1. **Limitations of the SPIDER algorithm in practical applications**: - **Over - conservative step - size selection**: The SPIDER algorithm uses a precision - dependent step - size $\eta = O(\epsilon / L)$, which can significantly slow down the convergence rate in practice. - **Inability to handle non - smooth regularization terms**: The SPIDER algorithm cannot be directly applied to objective functions containing non - smooth regularization terms, and such problems are very common in practice. 2. **Improving the SPIDER algorithm to enhance practical performance**: - **Proposing the SpiderBoost algorithm**: By allowing the use of a larger constant - level step - size $\eta = O(1 / L)$, while maintaining the same approximately optimal stochastic first - order oracle complexity as SPIDER, it accelerates convergence in practical applications. - **Extension to composite optimization problems**: The Prox - SpiderBoost algorithm is proposed, which can handle composite optimization problems containing non - smooth regularization terms and has provable convergence guarantees. 3. **Further accelerating the algorithm**: - **Introducing the momentum mechanism**: The Prox - SpiderBoost - M algorithm is proposed, which further accelerates the solution process of composite optimization problems through the momentum mechanism. This algorithm theoretically achieves the optimal oracle complexity and shows a significant acceleration effect in experiments. Specifically, the main contributions of the paper include: - **SpiderBoost algorithm**: Through a new convergence analysis method, it allows the use of a larger step - size, thereby significantly accelerating convergence in practical applications. - **Prox - SpiderBoost algorithm**: It extends the SpiderBoost algorithm, enabling it to handle composite optimization problems containing non - smooth regularization terms, and performs well both theoretically and experimentally. - **Prox - SpiderBoost - M algorithm**: By introducing the momentum mechanism, it further accelerates the solution process of composite optimization problems and achieves the optimal oracle complexity. These improvements are not only of great significance in theory but also show significant advantages in practical applications.

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization

Stochastic Variable Metric Proximal Gradient with variance reduction for non-convex composite optimization

Faster Stochastic Quasi-Newton Methods

Stochastic Momentum Method with Double Acceleration for Regularized Empirical Risk Minimization

Momentum-based variance-reduced stochastic Bregman proximal gradient methods for nonconvex nonsmooth optimization

Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds

A New Algorithm With Lower Complexity for Bilevel Optimization

D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

Accelerated Stochastic Gradient-free and Projection-free Methods

Multi-stage stochastic gradient method with momentum acceleration

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

Variance Reduction with Sparse Gradients

Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

ASVRG: Accelerated Proximal SVRG.

Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

Generalized Majorization-Minimization for Non-Convex Optimization.