Abstract:The Gaussian process (GP) is a widely used probabilistic machine learning method for stochastic function approximation, stochastic modeling, and analyzing real-world measurements of nonlinear processes. Unlike many other machine learning methods, GPs include an implicit characterization of uncertainty, making them extremely useful across many areas of science, technology, and engineering. Traditional implementations of GPs involve stationary kernels (also termed covariance functions) that limit their flexibility and exact methods for inference that prevent application to data sets with more than about ten thousand points. Modern approaches to address stationarity assumptions generally fail to accommodate large data sets, while all attempts to address scalability focus on approximating the Gaussian likelihood, which can involve subjectivity and lead to inaccuracies. In this work, we explicitly derive an alternative kernel that can discover and encode both sparsity and nonstationarity. We embed the kernel within a fully Bayesian GP model and leverage high-performance computing resources to enable the analysis of massive data sets. We demonstrate the favorable performance of our novel kernel relative to existing exact and approximate GP methods across a variety of synthetic data examples. Furthermore, we conduct space-time prediction based on more than one million measurements of daily maximum temperature and verify that our results outperform state-of-the-art methods in the Earth sciences. More broadly, having access to exact GPs that use ultra-scalable, sparsity-discovering, nonstationary kernels allows GP methods to truly compete with a wide variety of machine learning methods.

Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability

Linear-scaling kernels for protein sequences and small molecules outperform deep learning while providing uncertainty quantitation and improved interpretability

Thin and Deep Gaussian Processes

Compactly-supported nonstationary kernels for computing exact Gaussian processes on big data

ProSpar-GP: scalable Gaussian process modeling with massive non-stationary datasets

A Unifying Perspective on Non-Stationary Kernels for Deeper Gaussian Processes

Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes

Leveraging Locality and Robustness to Achieve Massively Scalable Gaussian Process Regression

Gaussian Processes with Spectral Delta kernel for higher accurate Potential Energy surfaces for large molecules

When Gaussian Process Meets Big Data: A Review of Scalable GPs

Reinforcement Learning via Gaussian Processes with Neural Network Dual Kernels

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Robust and Conjugate Gaussian Process Regression

mGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion

Gaussian process: an alternative approach for QSAM modeling of peptides

Scaling Gaussian Process Regression with Derivatives

Large-Scale Gaussian Processes via Alternating Projection

Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains

Gaussian Process with Graph Convolutional Kernel for Relational Learning

Understanding and comparing scalable Gaussian process regression for big data

Gaussian Processes for Analyzing Positioned Trajectories in Sports