On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty

Joost van Amersfoort,Lewis Smith,Andrew Jesson,Oscar Key,Yarin Gal
DOI: https://doi.org/10.48550/arXiv.2102.11409
2022-03-07
Abstract:Inducing point Gaussian process approximations are often considered a gold standard in uncertainty estimation since they retain many of the properties of the exact GP and scale to large datasets. A major drawback is that they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) promises a solution: a deep feature extractor transforms the inputs over which an inducing point Gaussian process is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that with no constraints, the DKL objective pushes "far-away" data points to be mapped to the same features as those of training-set points. With this insight we propose to constrain DKL's feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and other single forward pass uncertainty methods, while maintaining the speed and accuracy of standard neural networks.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenge of uncertainty estimation in deep - learning models during a single forward pass. Specifically, the author focuses on the poor performance of the Deep Kernel Learning (DKL) method on high - dimensional input data, especially the inaccurate uncertainty estimation when dealing with out - of - distribution (OoD) data. ### Main problems: 1. **Feature Collapse in DKL**: - When the deep feature extractor is unconstrained, DKL may map points far from the training data to the same feature - space positions as the training data points. This will cause the model to be over - confident about OoD data, resulting in unreliable uncertainty estimates. 2. **Scalability issues of high - dimensional input data**: - Traditional approximate methods of the inducing - point Gaussian Process (GP) are difficult to scale to high - dimensional input data. Although DKL can theoretically solve this problem by using a deep feature extractor to transform the input data, it shows poor uncertainty - estimation performance in practice. ### Solutions: To solve these problems, the author proposes a new model - Deterministic Uncertainty Estimation (DUE). DUE restricts the behavior of the deep feature extractor by introducing bi - Lipschitz constraints to ensure that distances in the feature space are preserved, thus avoiding feature collapse. Specific improvements include: - **Bi - Lipschitz constraints on the feature extractor**: By using residual connections and spectral normalization, the feature extractor is made sensitive and smooth to input changes, preventing feature collapse. - **Simplifying the training process**: DUE can be trained directly from scratch without pre - training, and the training process is stable, with a computational cost comparable to that of the standard softmax model. ### Experimental results: - In the CIFAR - 10 vs SVHN detection task, DUE significantly outperforms other single - forward - pass uncertainty methods. - In the regression task of personalized medicine, DUE shows better prediction performance and more accurate uncertainty estimation, and can correctly refer patients to experts when the data - overlap hypothesis does not hold. In conclusion, this paper solves the problem of inaccurate uncertainty estimation of DKL on high - dimensional input data by introducing the DUE model, providing an effective single - forward - pass uncertainty - estimation method.