Abstract:Learning the kernel parameters for Gaussian processes is often the computational bottleneck in applications such as online learning, Bayesian optimization, or active learning. Amortizing parameter inference over different datasets is a promising approach to dramatically speed up training time. However, existing methods restrict the amortized inference procedure to a fixed kernel structure. The amortization network must be redesigned manually and trained again in case a different kernel is employed, which leads to a large overhead in design time and training time. We propose amortizing kernel parameter inference over a complete kernel-structure-family rather than a fixed kernel structure. We do that via defining an amortization network over pairs of datasets and kernel structures. This enables fast kernel inference for each element in the kernel family without retraining the amortization network. As a by-product, our amortization network is able to do fast ensembling over kernel structures. In our experiments, we show drastically reduced inference time combined with competitive test performance for a large set of kernels and datasets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **the computational bottleneck problem in kernel function parameter learning in Gaussian Processes (GPs)**. Specifically, for application scenarios such as online learning, Bayesian optimization, or active learning, learning kernel function parameters is often a computational bottleneck. Existing methods learn these parameters through marginal likelihood maximization or evidence lower - bound maximization (ELBO), but these methods often require hundreds of optimization steps, resulting in overly long training times. To solve this problem, the paper proposes a new method, namely **amortized inference for the entire family of kernel structures**, rather than being limited to a fixed kernel structure. Specific contributions include: 1. **Construct an amortized neural network**: This network is defined on the joint space of the data set and the kernel structure and explicitly combines the invariance and equivariance of the underlying space. 2. **Experimental proof of effectiveness**: Demonstrates the effectiveness of amortized inference on multiple simulated and real - world data sets and kernel structures. 3. **Rapid integration of different kernel structures**: Demonstrates the generality of the method by defining a fast integration method. ### Specific problem description #### Limitations of traditional methods - **Fixed kernel structure**: Existing methods such as Liu et al. [2020b] can only perform amortized inference on a fixed kernel structure. If different kernel structures are to be used, the network needs to be redesigned and retrained, which will lead to a large amount of design time and training time overhead. - **Computational bottleneck**: Traditional kernel parameter learning methods (such as marginal likelihood maximization) require a large number of optimization steps, especially when dealing with medium - sized data sets, and the computational cost is very high. #### Advantages of the new method - **Joint - space amortized inference**: The new method avoids the trouble of redesigning and retraining the network every time the kernel structure is changed through amortized inference on the joint space of the data set and the kernel structure. - **Fast inference**: It can quickly infer elements in each kernel structure family without retraining. - **Generality**: It can quickly integrate different kernel structures, providing broader applicability. ### Mathematical formula representation To understand the problem more clearly, some key mathematical formulas are listed here: 1. **Definition of Gaussian process**: \[ f\sim\text{GP}(m,k) \] where \(m(x)\) is the mean function and \(k(x,x')\) is the kernel function. 2. **Marginal likelihood**: \[ p(y|X,\theta,\sigma^{2})=\mathcal{N}(y;m(X),k_{\theta}(X,X)+\sigma^{2}I) \] where \(y\) is the observed value, \(X\) is the input data, \(\theta\) is the kernel parameter, and \(\sigma^{2}\) is the noise variance. 3. **Optimization problem**: \[ (\theta^{*},\sigma^{2*})=\arg\max_{(\theta,\sigma^{2})\in\Phi}\log p(y|X,\theta,\sigma^{2}) \] 4. **Output of the amortized inference network**: \[ (\hat{\theta}_{S},\hat{\sigma}^{2}) = g_{\psi}(D,S) \] where \(D\) is the data set, \(S\) is the kernel expression, and \(g_{\psi}\) is the amortized inference network. In this way, the method proposed in the paper not only significantly reduces the inference time but also improves the flexibility and generality of the model, making the efficiency and performance of Gaussian processes in practical applications significantly improved.

Amortized Inference for Gaussian Process Hyperparameters of Structured Kernels

Amortized Bayesian Local Interpolation NetworK: Fast covariance parameter estimation for Gaussian Processes

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Hierarchical-Hyperplane Kernels for Actively Learning Gaussian Process Models of Nonstationary Systems

Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices

Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives

Kernel Looping: Eliminating Synchronization Boundaries for Peak Inference Performance

Large-Scale Gaussian Processes via Alternating Projection

Amortized Bayesian Workflow (Extended Abstract)

Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes

Neural Methods for Amortized Inference

Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Efficient training of Gaussian processes with tensor product structure

Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

Amortized Reparametrization: Efficient and Scalable Variational Inference for Latent SDEs

Refining Amortized Posterior Approximations using Gradient-Based Summary Statistics

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

Sketch In, Sketch Out: Accelerating both Learning and Inference for Structured Prediction with Kernels

Representing Additive Gaussian Processes by Sparse Matrices

Amortized Variational Inference for Deep Gaussian Processes

Structure Parameter Optimized Kernel Based Online Prediction With a Generalized Optimization Strategy for Nonstationary Time Series