Abstract:In many real-world problems, there is a limited set of training data, but an
abundance of unlabeled data. We propose a new method, Generative Posterior
Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in
high-dimensional problems. A GPN is a generative model that, given a prior
distribution over functions, approximates the posterior distribution directly
by regularizing the network towards samples from the prior. We prove
theoretically that our method indeed approximates the Bayesian posterior and
show empirically that it improves epistemic uncertainty estimation and
scalability over competing methods.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use unlabeled data to estimate epistemic uncertainty in high - dimensional problems in practical problems where data is limited but unlabeled data is abundant. Specifically, the authors propose a new method, Generative Posterior Networks (GPNs), which directly approximate the posterior distribution by regularizing the network to approach the prior samples. The paper proves that the proposed method can indeed approximate the Bayesian posterior, and through experiments, it is shown to be superior to existing methods in estimating epistemic uncertainty and scalability.
### Background of the Paper and Problem Statement
In supervised learning tasks, the distribution of training data often does not completely match the data distribution when the model is deployed. This distribution difference may lead to serious errors in environments with extremely high safety requirements. Ideally, we hope that the model can estimate its epistemic uncertainty - that is, the uncertainty due to a lack of data samples. However, deep - learning models have difficulties in estimating this type of uncertainty. Although there are many methods attempting to solve this problem, epistemic uncertainty estimation remains an open problem in terms of handling out - of - distribution (OOD) data and scalability.
### Method Overview
**Generative Posterior Networks (GPNs)**
- **Objective**: Use unlabeled data to estimate epistemic uncertainty.
- **Method**: GPNs are a generative model that directly approximate the posterior distribution by regularizing the network output to approach the prior samples.
- **Theoretical Basis**: The paper proves that GPNs can indeed approximate the Bayesian posterior distribution.
- **Experimental Results**: Experiments show that GPNs are superior to existing methods in estimating epistemic uncertainty and scalability.
### Related Work
- **Types of Uncertainty**: Uncertainty is divided into two types: aleatoric uncertainty and epistemic uncertainty. Aleatoric uncertainty refers to the inherent uncertainty in the system itself, which exists even when the true parameters of the system are known; epistemic uncertainty refers to the modeling uncertainty that can be reduced by collecting more data.
- **Existing Methods**: Deep - learning models perform well in quantifying aleatoric uncertainty, but still face challenges in estimating epistemic uncertainty. Existing methods include Gaussian Processes (GPs), Spectral Normalized Neural Gaussian Processes (SNGP), Deterministic Uncertainty Quantification (DUQ), etc., but these methods perform poorly on high - dimensional problems or have high computational complexity.
- **Bayesian Inference**: Epistemic uncertainty estimation can be regarded as a Bayesian inference problem. Given the prior distribution of a function and some labeled data, the goal is to construct the posterior distribution in the Bayesian sense. However, since the posterior distribution is usually difficult to calculate precisely, approximate sampling methods such as Markov Chain Monte Carlo (MCMC) and Variational Inference are used.
### Problem Definition and Background
- **Bayesian Inference**: Suppose we have a parameter \(\theta\) of a function \(f(\cdot; \theta)\), the prior distribution of the parameter \(P_{\text{prior}}(\theta)\sim\mathcal{N}(\mu_{\text{prior}}, \Sigma_{\text{prior}})\), and a noisy dataset \((x_{\text{obs}}, y_{\text{obs}})\). The observed value \(y_{\text{obs}} = f(x_{\text{obs}}; \theta)+\epsilon\), where \(\epsilon\sim\mathcal{N}(0, \sigma_\epsilon)\).
- **Likelihood Function**: The data likelihood function is defined as \(P_{\text{like}}(y_{\text{obs}}|\theta, x_{\text{obs}})=\prod_i\mathcal{N}(y_{\text{obs}}^i|\)