Abstract:Gaussian processes provide probabilistic surrogates for various applications including classification, uncertainty quantification, and optimization. Using a gradient-enhanced covariance matrix can be beneficial since it provides a more accurate surrogate relative to its gradient-free counterpart. An acute problem for Gaussian processes, particularly those that use gradients, is the ill-conditioning of their covariance matrices. Several methods have been developed to address this problem for gradient-enhanced Gaussian processes but they have various drawbacks such as limiting the data that can be used, imposing a minimum distance between evaluation points in the parameter space, or constraining the hyperparameters. In this paper a new method is presented that applies a diagonal preconditioner to the covariance matrix along with a modest nugget to ensure that the condition number of the covariance matrix is bounded, while avoiding the drawbacks listed above. Optimization results for a gradient-enhanced Bayesian optimizer with the Gaussian kernel are compared with the use of the new method, a baseline method that constrains the hyperparameters, and a rescaling method that increases the distance between evaluation points. The Bayesian optimizer with the new method converges the optimality, ie the $\ell_2$ norm of the gradient, an additional 5 to 9 orders of magnitude relative to when the baseline method is used and it does so in fewer iterations than with the rescaling method. The new method is available in the open source python library GpGradPy, which can be found at <a class="link-external link-https" href="https://github.com/marchildon/gpgradpy/tree/paper_precon" rel="external noopener nofollow">this https URL</a>. All of the figures in this paper can be reproduced with this library.

What problem does this paper attempt to address?

The paper aims to address the issue of ill-conditioning in the covariance matrix of Gaussian Processes (GP). Specifically, the paper proposes a new method that ensures the condition number of the covariance matrix remains below a user-defined threshold by applying a diagonal preconditioner and adding a modest nugget. This approach avoids some of the drawbacks of existing methods, such as limiting the available data, enforcing a minimum distance between evaluation points in the parameter space, or constraining hyperparameters. The paper demonstrates the advantages of the new method by comparing its performance with baseline methods (constraining hyperparameters) and rescaling methods (increasing the distance between evaluation points). Experimental results show that the Bayesian optimizer using the new method converges faster during optimization and achieves higher accuracy (with an additional 5 to 9 orders of magnitude convergence in the ℓ2 norm of the gradient). Moreover, the new method has been implemented in the open-source Python library `GpGradPy`, which can be accessed at the following link: [https://github.com/marchildon/gpgradpy/tree/paper_precon](https://github.com/marchildon/gpgradpy/tree/paper_precon). All the figures in the paper can be reproduced using this library.

A Solution to the Ill-Conditioning of Gradient-Enhanced Covariance Matrices for Gaussian Processes

Random Scaling of Quasi-Newton BFGS Method to Improve the O(N2)-operation Approximation of Covariance-matrix Inverse in Gaussian Process

Scaling Gaussian Process Regression with Derivatives

Speeding up the binary Gaussian process classification

A Fast GP Regression Method Using Banded Sparsification of Inverse Covariance

Hybrid kernel approach to improving the numerical stability of machine learning for parametric equations with Gaussian processes in the noisy and noise-free data assumptions

Global Optimization of Gaussian processes

A gradient-based and determinant-free framework for fully Bayesian Gaussian process regression

Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression

GP-HMAT: Scalable, ${O}(n\log(n))$ Gaussian Process Regression with Hierarchical Low-Rank Matrices

Stochastic Gradient Descent for Gaussian Processes Done Right

Robust and Conjugate Gaussian Process Regression

Representing Additive Gaussian Processes by Sparse Matrices

Efficient Gaussian process based on BFGS updating and logdet approximation

Posterior Concentration for Gaussian Process Priors under Rescaled and Hierarchical Matérn and Confluent Hypergeometric Covariance Functions

Vecchia Gaussian Processes: Probabilistic Properties, Minimax Rates and Methodological Developments

Robustness Against Outliers For Deep Neural Networks By Gradient Conjugate Priors

Standard Gaussian Process Can Be Excellent for High-Dimensional Bayesian Optimization

Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes

Learning gradients with gaussian processes