Abstract:In this paper, a new theory is developed for first-order stochastic convex optimization, showing that the global convergence rate is sufficiently quantified by a local growth rate of the objective function in a neighborhood of the optimal solutions. In particular, if the objective function [Formula: see text] in the [Formula: see text]-sublevel set grows as fast as [Formula: see text], where [Formula: see text] represents the closest optimal solution to [Formula: see text] and [Formula: see text] quantifies the local growth rate, the iteration complexity of first-order stochastic optimization for achieving an [Formula: see text]-optimal solution can be [Formula: see text], which is optimal at most up to a logarithmic factor. To achieve the faster global convergence, we develop two different accelerated stochastic subgradient methods by iteratively solving the original problem approximately in a local region around a historical solution with the size of the local region gradually decreasing as the solution approaches the optimal set. Besides the theoretical improvements, this work also includes new contributions toward making the proposed algorithms practical: (i) we present practical variants of accelerated stochastic subgradient methods that can run without the knowledge of multiplicative growth constant and even the growth rate [Formula: see text]; (ii) we consider a broad family of problems in machine learning to demonstrate that the proposed algorithms enjoy faster convergence than traditional stochastic subgradient method. We also characterize the complexity of the proposed algorithms for ensuring the gradient is small without the smoothness assumption.

Anytime Acceleration of Gradient Descent

Open Problem: Anytime Convergence Rate of Gradient Descent

Accelerated Gradient Descent via Long Steps

Provably Faster Gradient Descent via Long Steps

Accelerating Proximal Gradient Descent via Silver Stepsizes

Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

Accelerated Objective Gap and Gradient Norm Convergence for Gradient Descent via Long Steps

Accelerated Gradient Descent by Concatenation of Stepsize Schedules

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Gradient Methods with Online Scaling

Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule

The Anytime Convergence of Stochastic Gradient Descent with Momentum: From a Continuous-Time Perspective

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

Convergence Analysis of Accelerated Stochastic Gradient Descent under the Growth Condition

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

Composing Optimized Stepsize Schedules for Gradient Descent

Accelerate Stochastic Subgradient Method by Leveraging Local Growth Condition

Optimal Adaptive and Accelerated Stochastic Gradient Descent

An Analysis of Asynchronous Stochastic Accelerated Coordinate Descent

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates