A Solution to the Ill-Conditioning of Gradient-Enhanced Covariance Matrices for Gaussian Processes

André L. Marchildon,David W. Zingg
2023-07-12
Abstract:Gaussian processes provide probabilistic surrogates for various applications including classification, uncertainty quantification, and optimization. Using a gradient-enhanced covariance matrix can be beneficial since it provides a more accurate surrogate relative to its gradient-free counterpart. An acute problem for Gaussian processes, particularly those that use gradients, is the ill-conditioning of their covariance matrices. Several methods have been developed to address this problem for gradient-enhanced Gaussian processes but they have various drawbacks such as limiting the data that can be used, imposing a minimum distance between evaluation points in the parameter space, or constraining the hyperparameters. In this paper a new method is presented that applies a diagonal preconditioner to the covariance matrix along with a modest nugget to ensure that the condition number of the covariance matrix is bounded, while avoiding the drawbacks listed above. Optimization results for a gradient-enhanced Bayesian optimizer with the Gaussian kernel are compared with the use of the new method, a baseline method that constrains the hyperparameters, and a rescaling method that increases the distance between evaluation points. The Bayesian optimizer with the new method converges the optimality, ie the $\ell_2$ norm of the gradient, an additional 5 to 9 orders of magnitude relative to when the baseline method is used and it does so in fewer iterations than with the rescaling method. The new method is available in the open source python library GpGradPy, which can be found at <a class="link-external link-https" href="https://github.com/marchildon/gpgradpy/tree/paper_precon" rel="external noopener nofollow">this https URL</a>. All of the figures in this paper can be reproduced with this library.
Optimization and Control
What problem does this paper attempt to address?
The paper aims to address the issue of ill-conditioning in the covariance matrix of Gaussian Processes (GP). Specifically, the paper proposes a new method that ensures the condition number of the covariance matrix remains below a user-defined threshold by applying a diagonal preconditioner and adding a modest nugget. This approach avoids some of the drawbacks of existing methods, such as limiting the available data, enforcing a minimum distance between evaluation points in the parameter space, or constraining hyperparameters. The paper demonstrates the advantages of the new method by comparing its performance with baseline methods (constraining hyperparameters) and rescaling methods (increasing the distance between evaluation points). Experimental results show that the Bayesian optimizer using the new method converges faster during optimization and achieves higher accuracy (with an additional 5 to 9 orders of magnitude convergence in the ℓ2 norm of the gradient). Moreover, the new method has been implemented in the open-source Python library `GpGradPy`, which can be accessed at the following link: [https://github.com/marchildon/gpgradpy/tree/paper_precon](https://github.com/marchildon/gpgradpy/tree/paper_precon). All the figures in the paper can be reproduced using this library.