Abstract:This paper focuses on the problem of minimizing a finite-sum loss $ \frac{1}{N}$ $ \sum_{\xi=1}^N f (\mathbf{x}; \xi) $, where only function evaluations of $ f (\cdot; \xi) $ is allowed. For a fixed $ \xi $, which represents a (batch of) training data, the Hessian matrix $ \nabla^2 f (\mathbf{x}; \xi) $ is usually low-rank. We develop a stochastic zeroth-order cubic Newton method for such problems, and prove its efficiency. More specifically, we show that when $ \nabla^2 f (\mathbf{x}; \xi) \in \mathbb{R}^{n\times n } $ is of rank-$r$, $ \mathcal{O}\left(\frac{n}{\eta^{\frac{7}{2}}}\right)+\widetilde{\mathcal{O}}\left(\frac{n^2 r^2 }{\eta^{\frac{5}{2}}}\right) $ function evaluations guarantee a second order $\eta$-stationary point with high probability. This result improves the dependence on dimensionality compared to the existing state-of-the-art. This improvement is achieved via a new Hessian estimation method, which can be efficiently computed by finite-difference operations, and does not require any incoherence assumptions. Numerical experiments are provided to demonstrate the effectiveness of our algorithm.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively estimate the Hessian matrix and apply the cubic Newton method for optimization in zero - order stochastic optimization when the Hessian matrix of the objective function is low - rank on training data batches. Specifically, the paper mainly focuses on the following issues: 1. **Hessian Matrix Estimation in Zero - Order Optimization**: - In many practical scenarios, the objective function has no analytical formula, the formula is too complex, or needs to be kept confidential, so optimization can only be carried out through function value evaluation. This makes zero - order optimization very important. - The paper proposes a new Hessian matrix estimation method, which is especially suitable for the case of low - rank Hessian matrices. This method does not require any incoherence assumption and can be efficiently calculated through finite - difference operations. 2. **Efficient Estimation of Low - Rank Hessian Matrices**: - For each fixed training data point $\xi$, the Hessian matrix $\nabla^2 f(x; \xi)$ is usually low - rank. Existing Hessian estimation methods are often too conservative when dealing with low - rank Hessian matrices, resulting in high sample complexity. - The paper proposes a Hessian estimation method based on the matrix recovery principle, which can accurately recover a Hessian matrix of size $n\times n$ and rank $r$ from $O(nr^2\log^2 n)$ finite - difference calculations with high probability. 3. **Improved Zero - Order Stochastic Cubic Newton Method**: - Based on the proposed low - rank Hessian estimation method, the paper designs a zero - order stochastic cubic Newton method for the optimization of smoothing programs. - Specifically, when the rank of the Hessian matrix is restricted to $r$, this method can ensure finding a second - order $\eta$-stable point in the sense of expectation, and the number of function evaluations required is $O\left(\frac{n}{\eta^{7/2}}\right)+eO\left(\frac{n^2 r^2}{\eta^{5/2}}\right)$. This result significantly reduces the complexity of function evaluations compared to existing methods. ### Summary The main contribution of this paper is to propose an efficient low - rank Hessian matrix estimation method and apply it to the zero - order stochastic cubic Newton method, thereby improving the optimization efficiency, especially when dealing with low - rank Hessian structures.

Zeroth-order Stochastic Cubic Newton Method with Low-rank Hessian Estimation

Zeroth-order Low-rank Hessian Estimation via Matrix Recovery

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

Stochastic Variance-Reduced Cubic Regularized Newton Method

Stochastic Sub-Sampled Newton Method with Variance Reduction

Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

A Multilevel Low-Rank Newton Method with Super-linear Convergence Rate and its Application to Non-convex Problems

Stochastic Zeroth-order Optimization Via Variance Reduction Method.

A Single-Loop Stochastic Proximal Quasi-Newton Method for Large-Scale Nonsmooth Convex Optimization

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

N ov 2 01 6 A Proximal Stochastic Quasi-Newton Algorithm

Riemannian Accelerated Zeroth-order Algorithm: Improved Robustness and Lower Query Complexity

A Zeroth-Order Variance-Reduced Method for Decentralized Stochastic Non-convex Optimization

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing

A hybrid inexact regularized Newton and negative curvature method

An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity