Abstract:Classical convergence analyses for optimization algorithms rely on the widely-adopted uniform smoothness assumption. However, recent experimental studies have demonstrated that many machine learning problems exhibit non-uniform smoothness, meaning the smoothness factor is a function of the model parameter instead of a universal constant. In particular, it has been observed that the smoothness grows with respect to the gradient norm along the training trajectory. Motivated by this phenomenon, the recently introduced $(L_0, L_1)$-smoothness is a more general notion, compared to traditional $L$-smoothness, that captures such positive relationship between smoothness and gradient norm. Under this type of non-uniform smoothness, existing literature has designed stochastic first-order algorithms by utilizing gradient clipping techniques to obtain the optimal $\mathcal{O}(\epsilon^{-3})$ sample complexity for finding an $\epsilon$-approximate first-order stationary solution. Nevertheless, the studies of quasi-Newton methods are still lacking. Considering higher accuracy and more robustness for quasi-Newton methods, in this paper we propose a fast stochastic quasi-Newton method when there exists non-uniformity in smoothness. Leveraging gradient clipping and variance reduction, our algorithm can achieve the best-known $\mathcal{O}(\epsilon^{-3})$ sample complexity and enjoys convergence speedup with simple hyperparameter tuning. Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.

Approximate Newton Methods and Their Local Convergence.

A Unifying Framework for Convergence Analysis of Approximate Newton Methods.

Local Convergence of Inexact Methods under the Hölder Condition

Nesterov's Acceleration For Approximate Newton.

Revisiting Sub-sampled Newton Methods

On the Local Convergence of a Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization

On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond

A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

A Multilevel Low-Rank Newton Method with Super-linear Convergence Rate and its Application to Non-convex Problems

Stochastic Sub-Sampled Newton Method with Variance Reduction

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Extended Newton Methods for Multiobjective Optimization: Majorizing Function Technique and Convergence Analysis

Convergence Behavior of Gauss-Newton's Method and Extensions of the Smale Point Estimate Theory.

SPAN: A Stochastic Projected Approximate Newton Method

A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization.

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods

On the local convergence of the semismooth Newton method for composite optimization

Stochastic Newton Proximal Extragradient Method

Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions

Inexact Newton-type Methods for Optimisation with Nonnegativity Constraints

An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization