Abstract:We consider minimizing finite-sum and expectation objective functions via Hessian-averaging based subsampled Newton methods. These methods allow for gradient inexactness and have fixed per-iteration Hessian approximation costs. The recent work (Na et al. 2023) demonstrated that Hessian averaging can be utilized to achieve fast $\mathcal{O}\left(\sqrt{\tfrac{\log k}{k}}\right)$ local superlinear convergence for strongly convex functions in high probability, while maintaining fixed per-iteration Hessian costs. These methods, however, require gradient exactness and strong convexity, which poses challenges for their practical implementation. To address this concern we consider Hessian-averaged methods that allow gradient inexactness via norm condition based adaptive-sampling strategies. For the finite-sum problem we utilize deterministic sampling techniques which lead to global linear and sublinear convergence rates for strongly convex and nonconvex functions respectively. In this setting we are able to derive an improved deterministic local superlinear convergence rate of $\mathcal{O}\left(\tfrac{1}{k}\right)$. For the %expected risk expectation problem we utilize stochastic sampling techniques, and derive global linear and sublinear rates for strongly convex and nonconvex functions, as well as a $\mathcal{O}\left(\tfrac{1}{\sqrt{k}}\right)$ local superlinear convergence rate, all in expectation. We present novel analysis techniques that differ from the previous probabilistic results. Additionally, we propose scalable and efficient variations of these methods via diagonal approximations and derive the novel diagonally-averaged Newton (Dan) method for large-scale problems. Our numerical results demonstrate that the Hessian averaging not only helps with convergence, but can lead to state-of-the-art performance on difficult problems such as CIFAR100 classification with ResNets.

Revisiting Sub-sampled Newton Methods

A Unifying Framework for Convergence Analysis of Approximate Newton Methods.

Stochastic Sub-Sampled Newton Method with Variance Reduction

Approximate Newton Methods and Their Local Convergence.

Newton Sketch: A Linear-time Optimization Algorithm with Linear-Quadratic Convergence

A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

A Multilevel Low-Rank Newton Method with Super-linear Convergence Rate and its Application to Non-convex Problems

A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search

Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates

Accelerated Proximal Subsampled Newton Method

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

A Multilevel Method for Self-Concordant Minimization

A Single-Loop Stochastic Proximal Quasi-Newton Method for Large-Scale Nonsmooth Convex Optimization

Subsampled Optimization: Statistical Guarantees, Mean Squared Error Approximation, and Sampling Method

A Subsampling Line-Search Method with Second-Order Results

Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods

Do Subsampled Newton Methods Work for High-Dimensional Data?

Inexact Proximal Cubic Regularized Newton Methods for Convex Optimization

Faster Explicit Superlinear Convergence for Greedy and Random Quasi-Newton Methods