Abstract:When exploring the broad application prospects of large-scale Gaussian process regression (GPR), three core challenges significantly constrain its full effectiveness: firstly, the O(n3) time complexity of computing the inverse covariance matrix of n training points becomes an insurmountable performance bottleneck when processing large-scale datasets; Secondly, although traditional local approximation methods are widely used, they are often limited by the inconsistency of prediction results; The third issue is that many aggregation strategies lack discrimination when evaluating the importance of experts (i.e. local models), resulting in a loss of overall prediction accuracy. In response to the above challenges, this article innovatively proposes a comprehensive method that integrates third-degree stochastic fully symmetric interpolatory rules (TDSFSI), local approximation, and Tsallis mutual information (TDSFSIRLA), aiming to fundamentally break through existing limitations. Specifically, TDSFSIRLA first introduces an efficient third-degree stochastic fully symmetric interpolatory rules, which achieves accurate approximation of Gaussian kernel functions by generating adaptive dimensional feature maps. This innovation not only significantly reduces the number of required orthogonal nodes and effectively lowers computational costs, but also maintains extremely high approximation accuracy, providing a solid theoretical foundation for processing large-scale datasets. Furthermore, in order to overcome the inconsistency of local approximation methods, this paper adopts the Generalized Robust Bayesian Committee Machine (GRBCM) as the aggregation framework for local experts. GRBCM ensures the harmonious unity of the prediction results of each local model through its inherent consistency and robustness, significantly improving the stability and reliability of the overall prediction. More importantly, in response to the issue of uneven distribution of expert weights, this article creatively introduces Tsallis mutual information as a metric for weight allocation. Tsallis mutual information, with its sensitive ability to capture information complexity, assigns weights to different local experts that match their contribution, effectively solving the problem of prediction bias caused by uneven weight distribution and further improving prediction accuracy. In the experimental verification phase, this article conducted comprehensive testing on multiple synthetic datasets and seven representative real datasets. The results show that the TDSFSIRLA method not only achieves significant reduction in time complexity, but also demonstrates excellent performance in prediction accuracy, fully verifying its significant advantages and broad application prospects in the field of large-scale Gaussian process regression.

Precision aggregated local models

Leveraging Locality and Robustness to Achieve Massively Scalable Gaussian Process Regression

Generalized Local Aggregation for Large Scale Gaussian Process Regression

Patchwork Kriging for Large-scale Gaussian Process Regression

A Global-Local Approximation Framework for Large-Scale Gaussian Process Modeling

Further Understanding of a Local Gaussian Process Approximation: Characterising Convergence in the Finite Regime

Understanding and comparing scalable Gaussian process regression for big data

Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

Remarks for Scaling Up a General Gaussian Process to Model Large Dataset with Sub-models

Towards Scalable Gaussian Process Modeling

Jointly stochastic fully symmetric interpolatory rules and local approximation for scalable Gaussian process regression

ProSpar-GP: scalable Gaussian process modeling with massive non-stationary datasets

Composite Gaussian Processes: Scalable Computation and Performance Analysis

A Robust Approach to Gaussian Processes Implementation

Block Vecchia Approximation for Scalable and Efficient Gaussian Process Computations

Local Function Complexity for Active Learning via Mixture of Gaussian Processes

Exact and general decoupled solutions of the LMC Multitask Gaussian Process model

When Gaussian Process Meets Big Data: A Review of Scalable GPs

Accelerating Generalized Linear Models by Trading off Computation for Uncertainty

Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data