Abstract:Inducing point Gaussian process approximations are often considered a gold standard in uncertainty estimation since they retain many of the properties of the exact GP and scale to large datasets. A major drawback is that they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) promises a solution: a deep feature extractor transforms the inputs over which an inducing point Gaussian process is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that with no constraints, the DKL objective pushes "far-away" data points to be mapped to the same features as those of training-set points. With this insight we propose to constrain DKL's feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and other single forward pass uncertainty methods, while maintaining the speed and accuracy of standard neural networks.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the challenge of uncertainty estimation in deep - learning models during a single forward pass. Specifically, the author focuses on the poor performance of the Deep Kernel Learning (DKL) method on high - dimensional input data, especially the inaccurate uncertainty estimation when dealing with out - of - distribution (OoD) data. ### Main problems: 1. **Feature Collapse in DKL**: - When the deep feature extractor is unconstrained, DKL may map points far from the training data to the same feature - space positions as the training data points. This will cause the model to be over - confident about OoD data, resulting in unreliable uncertainty estimates. 2. **Scalability issues of high - dimensional input data**: - Traditional approximate methods of the inducing - point Gaussian Process (GP) are difficult to scale to high - dimensional input data. Although DKL can theoretically solve this problem by using a deep feature extractor to transform the input data, it shows poor uncertainty - estimation performance in practice. ### Solutions: To solve these problems, the author proposes a new model - Deterministic Uncertainty Estimation (DUE). DUE restricts the behavior of the deep feature extractor by introducing bi - Lipschitz constraints to ensure that distances in the feature space are preserved, thus avoiding feature collapse. Specific improvements include: - **Bi - Lipschitz constraints on the feature extractor**: By using residual connections and spectral normalization, the feature extractor is made sensitive and smooth to input changes, preventing feature collapse. - **Simplifying the training process**: DUE can be trained directly from scratch without pre - training, and the training process is stable, with a computational cost comparable to that of the standard softmax model. ### Experimental results: - In the CIFAR - 10 vs SVHN detection task, DUE significantly outperforms other single - forward - pass uncertainty methods. - In the regression task of personalized medicine, DUE shows better prediction performance and more accurate uncertainty estimation, and can correctly refer patients to experts when the data - overlap hypothesis does not hold. In conclusion, this paper solves the problem of inaccurate uncertainty estimation of DKL on high - dimensional input data by introducing the DUE model, providing an effective single - forward - pass uncertainty - estimation method.

On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty

Guided Deep Kernel Learning

Deep Latent-Variable Kernel Learning

Thin and Deep Gaussian Processes

Towards a Kernel based Uncertainty Decomposition Framework for Data and Models

Uncertainty Quantification in Deep Learning Based Kalman Filters

Multidimensional Uncertainty Quantification for Deep Neural Networks

Fixed-Mean Gaussian Processes for Post-hoc Bayesian Deep Learning

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

The Epistemic Uncertainty Hole: an issue of Bayesian Neural Networks

A Kernel Framework to Quantify a Model's Local Predictive Uncertainty under Data Distributional Shifts

Uncertainty Aware Deep Learning for Particle Accelerators

A Unifying Perspective on Non-Stationary Kernels for Deeper Gaussian Processes

Uncertainty Quantification for Sparse Deep Learning

Deep Kernel learning for reaction outcome prediction and optimization

Leveraging the Bhattacharyya coefficient for uncertainty quantification in deep neural networks

Transitional Uncertainty with Layered Intermediate Predictions

The Peril of Popular Deep Learning Uncertainty Estimation Methods

Critical feature learning in deep neural networks

Deep Ensemble as a Gaussian Process Approximate Posterior

Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference