Convex and Non-convex Optimization Under Generalized Smoothness

Haochuan Li,Jian Qian,Yi Tian,Alexander Rakhlin,Ali Jadbabaie
2023-11-03
Abstract:Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clipping, assuming bounded noise. In this paper, we further generalize this non-uniform smoothness condition and develop a simple, yet powerful analysis technique that bounds the gradients along the trajectory, thereby leading to stronger results for both convex and non-convex optimization problems. In particular, we obtain the classical convergence rates for (stochastic) gradient descent and Nesterov's accelerated gradient method in the convex and/or non-convex setting under this general smoothness condition. The new analysis approach does not require gradient clipping and allows heavy-tailed noise with bounded variance in the stochastic setting.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily explores the convergence issues of optimization algorithms (including gradient descent and stochastic gradient descent) under different types of smooth functions. Specifically: 1. **Generalizing Smoothness Conditions**: - The paper extends the traditional Lipschitz smoothness condition to a more general ℓ-smoothness condition. This new smoothness condition allows for a nonlinear relationship between the Hessian norm and the gradient norm of the function, thereby accommodating a wider range of function types. 2. **Analytical Methods**: - A new analytical method is proposed, which proves the convergence of various optimization algorithms (such as gradient descent, stochastic gradient descent, and Nesterov's accelerated gradient method) by restricting the gradient along the optimization trajectory. This method does not rely on gradient clipping and can handle heavy-tailed noise. 3. **Theoretical Results**: - The convergence of constant step-size gradient descent (GD), stochastic gradient descent (SGD), and Nesterov's accelerated gradient method (NAG) is proven under both convex and non-convex settings, achieving classical convergence rates. For convex functions, these methods can reach the optimal convergence speed; for non-convex functions, they can achieve the best-known complexity. 4. **Noise Assumptions**: - The noise assumption is relaxed to allow for noise with bounded variance instead of simply bounded noise, making the analysis more applicable to real-world problems. In summary, this paper aims to demonstrate the convergence of optimization algorithms over a broader class of functions by generalizing smoothness conditions and proposing new analytical methods, achieving convergence rates comparable to those for traditional smooth functions. This has significant implications for understanding and improving optimization algorithms in practice.