Abstract:Classical convergence analyses for optimization algorithms rely on the widely-adopted uniform smoothness assumption. However, recent experimental studies have demonstrated that many machine learning problems exhibit non-uniform smoothness, meaning the smoothness factor is a function of the model parameter instead of a universal constant. In particular, it has been observed that the smoothness grows with respect to the gradient norm along the training trajectory. Motivated by this phenomenon, the recently introduced $(L_0, L_1)$-smoothness is a more general notion, compared to traditional $L$-smoothness, that captures such positive relationship between smoothness and gradient norm. Under this type of non-uniform smoothness, existing literature has designed stochastic first-order algorithms by utilizing gradient clipping techniques to obtain the optimal $\mathcal{O}(\epsilon^{-3})$ sample complexity for finding an $\epsilon$-approximate first-order stationary solution. Nevertheless, the studies of quasi-Newton methods are still lacking. Considering higher accuracy and more robustness for quasi-Newton methods, in this paper we propose a fast stochastic quasi-Newton method when there exists non-uniformity in smoothness. Leveraging gradient clipping and variance reduction, our algorithm can achieve the best-known $\mathcal{O}(\epsilon^{-3})$ sample complexity and enjoys convergence speedup with simple hyperparameter tuning. Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.

Nesterov's Acceleration For Approximate Newton.

Approximate Newton Methods and Their Local Convergence.

Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness

A Note on Nesterov's Accelerated Method in Nonconvex Optimization: a Weak Estimate Sequence Approach

A Unifying Framework for Convergence Analysis of Approximate Newton Methods.

Stochastic Momentum Method with Double Acceleration for Regularized Empirical Risk Minimization

On adapting Nesterov's scheme to accelerate iterative methods for linear problems

A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

The Nesterov-Spokoiny Acceleration: $o(1/k^2)$ Convergence without Proximal Operations

Stochastic Newton Proximal Extragradient Method

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

Delayed supermartingale convergence lemmas for stochastic approximation with Nesterov momentum

Stochastic Sub-Sampled Newton Method with Variance Reduction

The "Black-Box" Optimization Problem: Zero-Order Accelerated Stochastic Method via Kernel Approximation

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Fast Multiobjective Gradient Methods with Nesterov Acceleration via Inertial Gradient-Like Systems

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

Accelerated Proximal Subsampled Newton Method

Nesterov acceleration despite very noisy gradients

The Black-Box Optimization Problem: Zero-Order Accelerated Stochastic Method via Kernel Approximation

Convergence Analysis of Accelerated Stochastic Gradient Descent under the Growth Condition