Abstract:Second-order methods can address the shortcomings of first-order methods for the optimization of large-scale machine learning models. However, second-order methods have significantly higher computational costs associated with the computation of second-order information. Subspace methods that are based on randomization have addressed some of these computational costs as they compute search directions in lower dimensions. Even though super-linear convergence rates have been empirically observed, it has not been possible to rigorously show that these variants of second-order methods can indeed achieve such fast rates. Also, it is not clear whether subspace methods can be applied to non-convex cases. To address these shortcomings, we develop a link between multigrid optimization methods and low-rank Newton methods that enables us to prove the super-linear rates of stochastic low-rank Newton methods rigorously. Our method does not require any computations in the original model dimension. We further propose a truncated version of the method that is capable of solving high-dimensional non-convex problems. Preliminary numerical experiments show that our method has a better escape rate from saddle points compared to accelerated gradient descent and Adam and thus returns lower training errors.

Inertial Newton Algorithms Avoiding Strict Saddle Points

Continuous Newton-like Methods featuring Inertia and Variable Mass

Inexact Newton Regularization in Banach Spaces Based on Two-Point Gradient Method with Uniformly Convex Penalty Terms

Escape Saddle Points by a Simple Gradient-Descent Based Algorithm

On the saddle point problem for non-convex optimization

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

INERTIAL ALGORITHMS FOR THE STATIONARY NAVIER-STOKES EQUATIONS

Inexact Newton-type Methods for Optimisation with Nonnegativity Constraints

Inertial self-adaptive algorithms for solving non-smooth convex optimization problems

Newton and interior-point methods for (constrained) nonconvex-nonconcave minmax optimization with stability and instability guarantees

A Multilevel Low-Rank Newton Method with Super-linear Convergence Rate and its Application to Non-convex Problems

A Deterministic Gradient-Based Approach to Avoid Saddle Points

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes

Accelerated inertial subgradient extragradient algorithms with non-monotonic step sizes for equilibrium problems and fixed point problems

A new inertial condition on the subgradient extragradient method for solving pseudomonotone equilibrium problem

A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization

Halpern inertial subgradient extragradient algorithm for solving equilibrium problems in Banach spaces

Multi-step inertial algorithms for equilibrium, fixed point, general systems of variational inequalities and split feasibility problems

Dealing with unbounded gradients in stochastic saddle-point optimization

Inertial Methods with Viscous and Hessian driven Damping for Non-Convex Optimization