Abstract:We consider minimizing finite-sum and expectation objective functions via Hessian-averaging based subsampled Newton methods. These methods allow for gradient inexactness and have fixed per-iteration Hessian approximation costs. The recent work (Na et al. 2023) demonstrated that Hessian averaging can be utilized to achieve fast $\mathcal{O}\left(\sqrt{\tfrac{\log k}{k}}\right)$ local superlinear convergence for strongly convex functions in high probability, while maintaining fixed per-iteration Hessian costs. These methods, however, require gradient exactness and strong convexity, which poses challenges for their practical implementation. To address this concern we consider Hessian-averaged methods that allow gradient inexactness via norm condition based adaptive-sampling strategies. For the finite-sum problem we utilize deterministic sampling techniques which lead to global linear and sublinear convergence rates for strongly convex and nonconvex functions respectively. In this setting we are able to derive an improved deterministic local superlinear convergence rate of $\mathcal{O}\left(\tfrac{1}{k}\right)$. For the %expected risk expectation problem we utilize stochastic sampling techniques, and derive global linear and sublinear rates for strongly convex and nonconvex functions, as well as a $\mathcal{O}\left(\tfrac{1}{\sqrt{k}}\right)$ local superlinear convergence rate, all in expectation. We present novel analysis techniques that differ from the previous probabilistic results. Additionally, we propose scalable and efficient variations of these methods via diagonal approximations and derive the novel diagonally-averaged Newton (Dan) method for large-scale problems. Our numerical results demonstrate that the Hessian averaging not only helps with convergence, but can lead to state-of-the-art performance on difficult problems such as CIFAR100 classification with ResNets.

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Optimization Problem**: The paper primarily focuses on the optimization problems of finite-sum and expectation objective functions. Specifically, these problems can be solved using subsampled Newton methods, which utilize Hessian averaging techniques to reduce the approximation cost of the Hessian matrix in each iteration. 2. **Hessian Matrix Approximation Problem**: Traditional subsampled Newton methods require a large sample size to achieve fast local linear or superlinear convergence, leading to high computational costs. The paper proposes a path-averaging-based Hessian matrix approximation method to overcome this limitation by reducing the variance in the estimation, allowing for good performance even with imprecise gradients. 3. **Gradient Imprecision**: In large-scale optimization problems, precisely calculating gradients is often impractical. The paper considers the case of imprecise gradients and proposes deterministic and stochastic adaptive gradient sampling strategies. These strategies ensure global linear and sublinear convergence rates even with imprecise gradient estimates and achieve superlinear convergence in locally strongly convex regions. 4. **Efficient Implementation**: To tackle large-scale problems, the paper also proposes several efficient algorithm variants, such as reducing the cost of Hessian matrix multiplication in each iteration through diagonal approximation and designing algorithms suitable for vectorized operations on GPUs. In summary, the main goal of the paper is to improve the efficiency of solving finite-sum and expectation optimization problems by introducing Hessian averaging and adaptive gradient sampling strategies while maintaining theoretical convergence.

Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods

Stochastic Sub-Sampled Newton Method with Variance Reduction

Stochastic Newton Proximal Extragradient Method

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes

Derivative-Free Optimization via Adaptive Sampling Strategies

Extremely Fast Convergence Rates for Extremum Seeking Control with Polyak-Ruppert Averaging

Adaptively Accelerating Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling

Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems

Fast convergence of sample-average approximation for saddle-point problems

Fast stochastic dual coordinate descent algorithms for linearly constrained convex optimization

Fast convex optimization via closed-loop time scaling of gradient dynamics

Optimal sampling for stochastic and natural gradient descent

Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing

An Adaptive Sampling Augmented Lagrangian Method for Stochastic Optimization with Deterministic Constraints

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

A Multilevel Low-Rank Newton Method with Super-linear Convergence Rate and its Application to Non-convex Problems

A Single-Loop Stochastic Proximal Quasi-Newton Method for Large-Scale Nonsmooth Convex Optimization

Subsampled Optimization: Statistical Guarantees, Mean Squared Error Approximation, and Sampling Method

On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling

Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function