Abstract:Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clipping, assuming bounded noise. In this paper, we further generalize this non-uniform smoothness condition and develop a simple, yet powerful analysis technique that bounds the gradients along the trajectory, thereby leading to stronger results for both convex and non-convex optimization problems. In particular, we obtain the classical convergence rates for (stochastic) gradient descent and Nesterov's accelerated gradient method in the convex and/or non-convex setting under this general smoothness condition. The new analysis approach does not require gradient clipping and allows heavy-tailed noise with bounded variance in the stochastic setting.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily explores the convergence issues of optimization algorithms (including gradient descent and stochastic gradient descent) under different types of smooth functions. Specifically: 1. **Generalizing Smoothness Conditions**: - The paper extends the traditional Lipschitz smoothness condition to a more general ℓ-smoothness condition. This new smoothness condition allows for a nonlinear relationship between the Hessian norm and the gradient norm of the function, thereby accommodating a wider range of function types. 2. **Analytical Methods**: - A new analytical method is proposed, which proves the convergence of various optimization algorithms (such as gradient descent, stochastic gradient descent, and Nesterov's accelerated gradient method) by restricting the gradient along the optimization trajectory. This method does not rely on gradient clipping and can handle heavy-tailed noise. 3. **Theoretical Results**: - The convergence of constant step-size gradient descent (GD), stochastic gradient descent (SGD), and Nesterov's accelerated gradient method (NAG) is proven under both convex and non-convex settings, achieving classical convergence rates. For convex functions, these methods can reach the optimal convergence speed; for non-convex functions, they can achieve the best-known complexity. 4. **Noise Assumptions**: - The noise assumption is relaxed to allow for noise with bounded variance instead of simply bounded noise, making the analysis more applicable to real-world problems. In summary, this paper aims to demonstrate the convergence of optimization algorithms over a broader class of functions by generalizing smoothness conditions and proposing new analytical methods, achieving convergence rates comparable to those for traditional smooth functions. This has significant implications for understanding and improving optimization algorithms in practice.

Convex and Non-convex Optimization Under Generalized Smoothness

Convergence Analysis of Gradient Algorithms on Riemannian Manifolds Without Curvature Constraints and Application to Riemannian Mass

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

Extended convexity and smoothness and their applications in deep learning

Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization

Convergence analysis of the Gauss–Newton method for convex inclusion and convex-composite optimization problems

Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm

Gauss-Newton Methods for A Class of Nonsmooth Optimization Problems

Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation

Gradient-Free Methods for Non-Smooth Convex Stochastic Optimization with Heavy-Tailed Noise on Convex Compact

Tikhonov Regularization for Stochastic Non-Smooth Convex Optimization in Hilbert Spaces

A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

Beyond Convexity: Stochastic Quasi-Convex Optimization

Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Stochastic Optimization under Hidden Convexity

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

High-Probability Complexity Bounds for Non-smooth Stochastic Convex Optimization with Heavy-Tailed Noise

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Linear convergence of first order methods for non-strongly convex optimization