Higher-Order Corrections to Optimisers based on Newton's Method

Stephen Brooks
2024-05-10
Abstract:The Newton, Gauss--Newton and Levenberg--Marquardt methods all use the first derivative of a vector function (the Jacobian) to minimise its sum of squares. When the Jacobian matrix is ill-conditioned, the function varies much faster in some directions than others and the space of possible improvement in sum of squares becomes a long narrow ellipsoid in the linear model. This means that even a small amount of nonlinearity in the problem parameters can cause a proposed point far down the long axis of the ellipsoid to fall outside of the actual curved valley of improved values, even though it is quite nearby. This paper presents a differential equation that `follows' these valleys, based on the technique of geodesic acceleration, which itself provides a 2$^\mathrm{nd}$ order improvement to the Levenberg--Marquardt iteration step. Higher derivatives of this equation are computed that allow $n^\mathrm{th}$ order improvements to the optimisation methods to be derived. These higher-order accelerated methods up to 4$^\mathrm{th}$ order are tested numerically and shown to provide substantial reduction of both number of steps and computation time.
Numerical Analysis
What problem does this paper attempt to address?
The paper aims to address the "narrow curved valley" problem encountered during optimization, where the optimization function changes much faster in some directions than others when the Jacobian matrix is ill-conditioned, resulting in the improvement space in the linear model becoming a long and narrow ellipsoid. Even with small nonlinearity, proposed points far along the ellipsoid's major axis may fall outside the actual improvement curve. To solve this issue, the paper proposes a differential equation based on geodesic acceleration techniques to "follow" these curves and provides a second-order improved Levenberg-Marquardt iteration step. By computing higher-order derivatives, higher-order optimization methods can be derived. Experimental results show that these higher-order acceleration methods can significantly reduce the number of iterations and computation time. Specifically, the paper presents the following key points: 1. **Natural Optimization Path**: Defines an implicit equation \( f(x(t)) = (1-t)f(x(0)) \), where \( t \in [0,1] \), to uniformly scale all components of the error vector. 2. **Higher-Order Derivatives**: Uses the Faà di Bruno formula to compute the second and higher-order derivatives of the natural path and derives the corresponding higher-order acceleration terms. 3. **Finite Difference Scheme**: Proposes a finite difference scheme for computing multi-directional derivatives to implement higher-order correction terms in numerical methods. 4. **Numerical Tests**: Validates the performance of the higher-order algorithms through a simple function, particularly under different anisotropy factors \( K \). 5. **Practical Application**: Applies these algorithms to a complex physical problem (ion focusing), demonstrating the significant acceleration effect of higher-order methods in the optimization process. In summary, the paper improves optimization methods like Levenberg-Marquardt by introducing higher-order correction terms, addressing the "narrow curved valley" problem in the optimization process, thereby enhancing optimization efficiency.