Zeroth-order Stochastic Cubic Newton Method with Low-rank Hessian Estimation

Yu Liu,Weibin Peng,Tianyu Wang,Jiajia Yu
2024-10-16
Abstract:This paper focuses on the problem of minimizing a finite-sum loss $ \frac{1}{N}$ $ \sum_{\xi=1}^N f (\mathbf{x}; \xi) $, where only function evaluations of $ f (\cdot; \xi) $ is allowed. For a fixed $ \xi $, which represents a (batch of) training data, the Hessian matrix $ \nabla^2 f (\mathbf{x}; \xi) $ is usually low-rank. We develop a stochastic zeroth-order cubic Newton method for such problems, and prove its efficiency. More specifically, we show that when $ \nabla^2 f (\mathbf{x}; \xi) \in \mathbb{R}^{n\times n } $ is of rank-$r$, $ \mathcal{O}\left(\frac{n}{\eta^{\frac{7}{2}}}\right)+\widetilde{\mathcal{O}}\left(\frac{n^2 r^2 }{\eta^{\frac{5}{2}}}\right) $ function evaluations guarantee a second order $\eta$-stationary point with high probability. This result improves the dependence on dimensionality compared to the existing state-of-the-art. This improvement is achieved via a new Hessian estimation method, which can be efficiently computed by finite-difference operations, and does not require any incoherence assumptions. Numerical experiments are provided to demonstrate the effectiveness of our algorithm.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively estimate the Hessian matrix and apply the cubic Newton method for optimization in zero - order stochastic optimization when the Hessian matrix of the objective function is low - rank on training data batches. Specifically, the paper mainly focuses on the following issues: 1. **Hessian Matrix Estimation in Zero - Order Optimization**: - In many practical scenarios, the objective function has no analytical formula, the formula is too complex, or needs to be kept confidential, so optimization can only be carried out through function value evaluation. This makes zero - order optimization very important. - The paper proposes a new Hessian matrix estimation method, which is especially suitable for the case of low - rank Hessian matrices. This method does not require any incoherence assumption and can be efficiently calculated through finite - difference operations. 2. **Efficient Estimation of Low - Rank Hessian Matrices**: - For each fixed training data point \(\xi\), the Hessian matrix \(\nabla^2 f(x; \xi)\) is usually low - rank. Existing Hessian estimation methods are often too conservative when dealing with low - rank Hessian matrices, resulting in high sample complexity. - The paper proposes a Hessian estimation method based on the matrix recovery principle, which can accurately recover a Hessian matrix of size \(n\times n\) and rank \(r\) from \(O(nr^2\log^2 n)\) finite - difference calculations with high probability. 3. **Improved Zero - Order Stochastic Cubic Newton Method**: - Based on the proposed low - rank Hessian estimation method, the paper designs a zero - order stochastic cubic Newton method for the optimization of smoothing programs. - Specifically, when the rank of the Hessian matrix is restricted to \(r\), this method can ensure finding a second - order \(\eta\)-stable point in the sense of expectation, and the number of function evaluations required is \(O\left(\frac{n}{\eta^{7/2}}\right)+eO\left(\frac{n^2 r^2}{\eta^{5/2}}\right)\). This result significantly reduces the complexity of function evaluations compared to existing methods. ### Summary The main contribution of this paper is to propose an efficient low - rank Hessian matrix estimation method and apply it to the zero - order stochastic cubic Newton method, thereby improving the optimization efficiency, especially when dealing with low - rank Hessian structures.