Abstract:Gradient-based minimax optimal algorithms have greatly promoted the development of continuous optimization and machine learning. One seminal work due to Yurii Nesterov [Nes83a] established $\tilde{\mathcal{O}}(\sqrt{L/\mu})$ gradient complexity for minimizing an $L$-smooth $\mu$-strongly convex objective. However, an ideal algorithm would adapt to the explicit complexity of a particular objective function and incur faster rates for simpler problems, triggering our reconsideration of two defeats of existing optimization modeling and analysis. (i) The worst-case optimality is neither the instance optimality nor such one in reality. (ii) Traditional $L$-smoothness condition may not be the primary abstraction/characterization for modern practical problems. In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning, including linear regression and beyond. We introduce two factors $(\alpha, \tau_{\alpha})$ to refine the description of the degenerated condition of the optimization problems based on the observation that the singular values of Hessian often drop sharply. We design adaptive algorithms that solve simpler problems without pre-known knowledge with reduced gradient or analogous oracle accesses. The algorithms also improve the state-of-art complexities for several problems in machine learning, thereby solving the open problem of how to design faster algorithms in light of the known complexity lower bounds. Specially, with the $\mathcal{O}(1)$-nuclear norm bounded, we achieve an optimal $\tilde{\mathcal{O}}(\mu^{-1/3})$ (v.s. $\tilde{\mathcal{O}}(\mu^{-1/2})$) gradient complexity for linear regression. We hope this work could invoke the rethinking for understanding the difficulty of modern problems in optimization.

Laplacian Smoothing Gradient Descent

Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Predictive Local Smoothness for Stochastic Gradient Methods

Gaussian smoothing gradient descent for minimizing functions (GSmoothGD)

A Deterministic Gradient-Based Approach to Avoid Saddle Points

Non-Uniform Smoothness for Gradient Descent

Random Smoothing Regularization in Kernel Gradient Descent Learning

Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Smooth over-parameterized solvers for non-smooth structured optimization

Accelerated zero-order SGD under high-order smoothness and overparameterized regime

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization

A Smoothing Stochastic Gradient Method for Composite Optimization

Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

A New Conjugate Gradient Method with Smoothing L_1/2 Regularization Based on a Modified Secant Equation for Training Neural Networks

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

Smoothing $\mathcal{L}^2$ gradients in iterative regularization