Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Lu Zou,Liang Ding

2024-03-31

Abstract:dditive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of $\mathcal{O}(n\log n)$ iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to $\mathcal{O}(\log n)$ while preserving the time and space complexities at $\mathcal{O}(n\log n)$ and $\mathcal{O}(n)$ per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.

Machine Learning

What problem does this paper attempt to address?

This paper focuses on accelerating the training of Additive Gaussian Processes (GPs) in high-dimensional data. Additive GPs are a nonparametric feature selection method commonly used in high-dimensional generalized additive models. However, the convergence speed of the Bayesian Back-fitting algorithm has been an unresolved issue in training these models. The author introduces a technique called "Kernel Packet" (KP) and proves that the convergence speed of Bayesian Back-fitting cannot be faster than $1-\mathcal{O}(1/n)$, where $n$ is the data size and $t$ is the number of iterations. This implies that at least $O(n\log n)$ iterations are required to achieve convergence. To address this problem, the paper proposes a new algorithm called "Kernel Multigrid" (KMG). KMG combines sparse Gaussian process regression (GPR) and handles residuals after each iteration, improving training efficiency. Theoretical analysis shows that KMG reduces the required number of iterations to $O(\log n)$, while maintaining time and space complexity per iteration as $O(n\log n)$ and $O(n)$ respectively. The paper also points out that although Bayesian Back-fitting performs well in prediction and classification tasks, it faces difficulties in global feature allocation. KMG effectively addresses this issue through sparse GPR. Numerical experiments demonstrate that KMG can accurately approximate high-dimensional targets within 5 iterations using only 10 inducing points. In summary, the paper addresses the convergence speed issue of Bayesian Back-fitting in Additive GPs and proposes a more efficient training method, KMG, which helps improve computational efficiency in handling large-scale high-dimensional data.

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Representing Additive Gaussian Processes by Sparse Matrices

Gaussian Process Regression with Grid Spectral Mixture Kernel: Distributed Learning for Multidimensional Data

A Fast GP Regression Method Using Banded Sparsification of Inverse Covariance

Additive Kernels for Gaussian Process Modeling

Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

Large-Scale Gaussian Processes via Alternating Projection

Sparse gaussian processes using backward elimination

Asynchronous Parallel Large-Scale Gaussian Process Regression

Compressing spectral kernels in Gaussian Process: Enhanced generalization and interpretability

A Solution to the Ill-Conditioning of Gradient-Enhanced Covariance Matrices for Gaussian Processes

Scaling Gaussian Process Regression with Derivatives

Further Understanding of a Local Gaussian Process Approximation: Characterising Convergence in the Finite Regime

Parallel cross-validation: A scalable fitting method for Gaussian process models

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Recursive estimation for sparse Gaussian process regression

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Compactly-supported nonstationary kernels for computing exact Gaussian processes on big data

Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression

Distributed Gaussian Processes Hyperparameter Optimization for Big Data Using Proximal ADMM

Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification