Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Lu Zou,Liang Ding
2024-03-31
Abstract:dditive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of $\mathcal{O}(n\log n)$ iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to $\mathcal{O}(\log n)$ while preserving the time and space complexities at $\mathcal{O}(n\log n)$ and $\mathcal{O}(n)$ per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.
Machine Learning
What problem does this paper attempt to address?
This paper focuses on accelerating the training of Additive Gaussian Processes (GPs) in high-dimensional data. Additive GPs are a nonparametric feature selection method commonly used in high-dimensional generalized additive models. However, the convergence speed of the Bayesian Back-fitting algorithm has been an unresolved issue in training these models. The author introduces a technique called "Kernel Packet" (KP) and proves that the convergence speed of Bayesian Back-fitting cannot be faster than \(1-\mathcal{O}(1/n)\), where \(n\) is the data size and \(t\) is the number of iterations. This implies that at least \(O(n\log n)\) iterations are required to achieve convergence. To address this problem, the paper proposes a new algorithm called "Kernel Multigrid" (KMG). KMG combines sparse Gaussian process regression (GPR) and handles residuals after each iteration, improving training efficiency. Theoretical analysis shows that KMG reduces the required number of iterations to \(O(\log n)\), while maintaining time and space complexity per iteration as \(O(n\log n)\) and \(O(n)\) respectively. The paper also points out that although Bayesian Back-fitting performs well in prediction and classification tasks, it faces difficulties in global feature allocation. KMG effectively addresses this issue through sparse GPR. Numerical experiments demonstrate that KMG can accurately approximate high-dimensional targets within 5 iterations using only 10 inducing points. In summary, the paper addresses the convergence speed issue of Bayesian Back-fitting in Additive GPs and proposes a more efficient training method, KMG, which helps improve computational efficiency in handling large-scale high-dimensional data.