Large-Scale Gaussian Processes via Alternating Projection
Kaiwen Wu,Jonathan Wenger,Haydn Jones,Geoff Pleiss,Jacob R. Gardner
2023-10-26
Abstract:Training and inference in Gaussian processes (GPs) require solving linear
systems with $n\times n$ kernel matrices. To address the prohibitive
$\mathcal{O}(n^3)$ time complexity, recent work has employed fast iterative
methods, like conjugate gradients (CG). However, as datasets increase in
magnitude, the kernel matrices become increasingly ill-conditioned and still
require $\mathcal{O}(n^2)$ space without partitioning. Thus, while CG increases
the size of datasets GPs can be trained on, modern datasets reach scales beyond
its applicability. In this work, we propose an iterative method which only
accesses subblocks of the kernel matrix, effectively enabling mini-batching.
Our algorithm, based on alternating projection, has $\mathcal{O}(n)$
per-iteration time and space complexity, solving many of the practical
challenges of scaling GPs to very large datasets. Theoretically, we prove the
method enjoys linear convergence. Empirically, we demonstrate its fast
convergence in practice and robustness to ill-conditioning. On large-scale
benchmark datasets with up to four million data points, our approach
accelerates GP training and inference by speed-up factors up to $27\times$ and
$72 \times$, respectively, compared to CG.
Machine Learning