Abstract:We consider minimizing finite-sum and expectation objective functions via Hessian-averaging based subsampled Newton methods. These methods allow for gradient inexactness and have fixed per-iteration Hessian approximation costs. The recent work (Na et al. 2023) demonstrated that Hessian averaging can be utilized to achieve fast $\mathcal{O}\left(\sqrt{\tfrac{\log k}{k}}\right)$ local superlinear convergence for strongly convex functions in high probability, while maintaining fixed per-iteration Hessian costs. These methods, however, require gradient exactness and strong convexity, which poses challenges for their practical implementation. To address this concern we consider Hessian-averaged methods that allow gradient inexactness via norm condition based adaptive-sampling strategies. For the finite-sum problem we utilize deterministic sampling techniques which lead to global linear and sublinear convergence rates for strongly convex and nonconvex functions respectively. In this setting we are able to derive an improved deterministic local superlinear convergence rate of $\mathcal{O}\left(\tfrac{1}{k}\right)$. For the %expected risk expectation problem we utilize stochastic sampling techniques, and derive global linear and sublinear rates for strongly convex and nonconvex functions, as well as a $\mathcal{O}\left(\tfrac{1}{\sqrt{k}}\right)$ local superlinear convergence rate, all in expectation. We present novel analysis techniques that differ from the previous probabilistic results. Additionally, we propose scalable and efficient variations of these methods via diagonal approximations and derive the novel diagonally-averaged Newton (Dan) method for large-scale problems. Our numerical results demonstrate that the Hessian averaging not only helps with convergence, but can lead to state-of-the-art performance on difficult problems such as CIFAR100 classification with ResNets.

Accelerating Derivative-Free Optimization with Dimension Reduction and Hyperparameter Learning

Stochastic Trust-Region Algorithm in Random Subspaces with Convergence and Expected Complexity Analyses

An Aggressive Reduction on the Complexity of Optimization for Non-Strongly Convex Objectives

High-dimensional Bayesian optimization using low-dimensional feature spaces

Derivative-Free Optimization with Adaptive Experience for Efficient Hyper-Parameter Tuning.

A Metaheuristic for Amortized Search in High-Dimensional Parameter Spaces

Curvature-Aware Derivative-Free Optimization

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Adaptive Sampling-Based Bi-Fidelity Stochastic Trust Region Method for Derivative-Free Stochastic Optimization

Alternating Differentiation for Optimization Layers

Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization

Convergence Analysis of the Fast Subspace Descent Methods for Convex Optimization Problems

A Scalable Derivative-free Exploration Approach for Reinforcement Learning

The Double-Accelerated Stochastic Method for Regularized Empirical Risk Minimization

Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods

Accelerated Forward-Backward Optimization Using Deep Learning

Deterministic Langevin Unconstrained Optimization with Normalizing Flows

Friction-adaptive descent: a family of dynamics-based optimization methods

Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning

Experienced Optimization with Reusable Directional Model for Hyper-Parameter Search

An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification