Abstract:Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). Furthermore, a fast one pass algorithm called Cluster–Classify–Regress (CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly. This powerful combination of model and algorithm together delivers a novel method which is flexible, robust, and extremely efficient. In particular, the method is able to outperform competing methods in terms of accuracy and uncertainty quantification. The cost is competitive on low-dimensional and small data sets, but is significantly lower for higher-dimensional and big data sets . Iteratively maximizing the distribution of experts given allocations and allocations given experts does not provide significant improvement, which indicates that the algorithm achieves a good approximation to the local MAP estimator very fast. This insight can be useful also in the context of other mixture of experts models.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to construct a flexible, efficient and robust Mixture of Experts (MoE) by combining Deep Neural Network (DNN) and Gaussian Process (GP) expert models in the context of supervised learning. Specifically, this research aims to: 1. **Improve computational efficiency**: Traditional Gaussian Process (GP) models have high computational costs due to the need to handle large and dense covariance matrices. The method proposed in this paper reduces the computational complexity by dividing the data into multiple subsets and assigning a GP expert to each subset. 2. **Enhance model flexibility**: By using DNN as a gating network, the regional division of the input space can be determined more flexibly, allowing the regression functions in different regions to have different properties (such as smoothness, variability, etc.), thereby capturing complex characteristics such as non - stationarity, heterogeneity, discontinuity and multimodality. 3. **Provide accurate probability predictions**: GP experts provide good uncertainty quantification capabilities, and DNN enhances the expressive power of the model. The combination of the two can provide more accurate probability predictions while maintaining flexibility. 4. **Fast approximate Maximum A Posteriori (MAP)**: To further improve efficiency, the article introduces a fast algorithm named Cluster - Classify - Regress (CCR) for approximating MAP estimators. This algorithm can complete clustering, classification and regression tasks in one pass, significantly reducing the computational cost, especially in high - dimensional and large - data sets. In summary, the core objective of this paper is to develop a new MoE model that can effectively handle large - scale data and complex pattern recognition tasks, while ensuring that the model has good uncertainty and probability prediction capabilities.

Fast deep mixtures of Gaussian process experts

Fast Deep Mixtures of Gaussian Process Experts

Gaussian Process-Gated Hierarchical Mixtures of Experts

Hierarchical Mixture-of-Experts Model for Large-Scale Gaussian Process Regression

Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions

Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts

Mixture of Gaussian Processes and its Applications

An Effective EM Algorithm for Mixtures of Gaussian Processes Via the MCMC Sampling and Approximation.

An Efficient Em Approach To Parameter Learning Of The Mixture Of Gaussian Processes

Mixture of robust Gaussian processes and its hard-cut EM algorithm with variational bounding approximation

Neural-g: A Deep Learning Framework for Mixing Density Estimation

Efficient Learning Algorithms for Gaussian Processes

Learning Mixtures of Gaussians Using Diffusion Models

A Precise Hard-Cut EM Algorithm for Mixtures of Gaussian Processes.

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

An Mcmc Based Em Algorithm For Mixtures Of Gaussian Processes

Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian Processes

Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts

A Two-Layer Mixture Model of Gaussian Process Functional Regressions and Its MCMC EM Algorithm

Transductive Log Opinion Pool of Gaussian Process Experts