Abstract:For large-scale optimization that covers a wide range of optimization problems encountered frequently in machine learning and deep neural networks, stochastic optimization has become one of the most used methods thanks to its low computational complexity. In machine learning and deep learning problems, nonconvex problems are common, while convex problems are rare. How to find the global minimum for nonconvex optimization and reduce the computational complexity are challenges. Inspired by the phenomenon that the stagewise stepsize tuning strategy can empirically improve the convergence speed in deep neural networks, we incorporate the stagewise stepsize tuning strategy into the iterative framework of Nesterov's acceleration- and variance reduction-based methods to reduce the computational complexity, i.e., the stagewise stepsize tuning strategy is incorporated into randomized stochastic accelerated gradient and stochastic variance-reduced gradient. The proposed methods are theoretically derived to reduce the complexity of the nonconvex and convex problems and improve the convergence rate of the frameworks, which have the complexity O(1/με) and O(1/με), respectively, where μ is the PL modulus and L is the Lipschitz constant. In the end, numerical experiments on large benchmark datasets validate well the competitiveness of the proposed methods.

What problem does this paper attempt to address?

### The Problem Addressed by the Paper This paper aims to address the challenges of non-convex optimization in large-scale optimization problems. Specifically, the paper focuses on the following points: 1. **Reducing Computational Complexity**: In machine learning and deep neural networks, optimization problems are often non-convex, making it very difficult to find the global minimum. Traditional Stochastic Gradient Descent (SGD) methods, while having low computational complexity, converge slowly, especially when dealing with large-scale datasets. 2. **Improving Convergence Speed**: To accelerate the convergence of non-convex optimization problems, the paper introduces a staged step size adjustment strategy (SSTS). By combining Nesterov acceleration and variance reduction techniques, the paper proposes new optimization algorithms—Staged Accelerated Randomized Stochastic Gradient (S-RSAG) and Staged Accelerated Variance Reduced Gradient (S-SVRG). 3. **Theoretical Analysis and Experimental Validation**: The paper theoretically proves that the proposed algorithms have complexities of O(1/µϵ) and O(L/µϵ) for non-convex and convex optimization problems, respectively. It also conducts experimental validation on multiple benchmark datasets, demonstrating the effectiveness and competitiveness of these algorithms. ### Main Contributions of the Paper 1. **Staged Accelerated Randomized Stochastic Gradient (S-RSAG)**: - For non-convex optimization problems, the iteration complexity of S-RSAG is O(L/µϵ), significantly lower than the non-staged version RSAG's O(L^2/ϵ + L/ϵ^2). - For convex optimization problems, the iteration complexity of S-RSAG is O(1/µϵ), also superior to the non-staged version RSAG's O(L/√ϵ + 1/ϵ^2). 2. **Staged Accelerated Variance Reduced Gradient (S-SVRG)**: - For non-convex optimization problems, the iteration complexity of S-SVRG is O(Lm/(µ√ϵ)), significantly better than the non-staged version SVRG. - For convex optimization problems, the iteration complexity of S-SVRG is O(Lm/(µ^2√ϵ)), also superior to the non-staged version SVRG. 3. **Experimental Results**: - Experiments on datasets such as MNIST, CIFAR-10, REAL-SIM, and RCV1 validate the superior performance of S-RSAG and S-SVRG in terms of loss value, training accuracy, and testing accuracy. ### Summary By introducing a staged step size adjustment strategy and combining Nesterov acceleration and variance reduction techniques, this paper proposes two new optimization algorithms, S-RSAG and S-SVRG. These algorithms show significant performance improvements both theoretically and experimentally, particularly in handling large-scale non-convex optimization problems, effectively reducing computational complexity and improving convergence speed.

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

Parallel Stochastic Optimization Framework for Large-Scale Non-Convex Stochastic Problems

Multi-stage stochastic gradient method with momentum acceleration

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic Momentum Method with Double Acceleration for Regularized Empirical Risk Minimization

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

Convergence Analysis of Asynchronous Stochastic Recursive Gradient Algorithms

An Accelerated Stochastic ADMM for Nonconvex and Nonsmooth Finite-Sum Optimization

Accelerated Stochastic Algorithms for Nonconvex Finite-sum and Multi-block Optimization

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

Combining Conjugate Gradient and Momentum for Unconstrained Stochastic Optimization With Applications to Machine Learning

Adaptive smoothing mini-batch stochastic accelerated gradient method for nonsmooth convex stochastic composite optimization

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Stochastic alternating structure-adapted proximal gradient descent method with variance reduction for nonconvex nonsmooth optimization

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems