Abstract:Achieving robust uncertainty quantification for deep neural networks
represents an important requirement in many real-world applications of deep
learning such as medical imaging where it is necessary to assess the
reliability of a neural network's prediction. Bayesian neural networks are a
promising approach for modeling uncertainties in deep neural networks.
Unfortunately, generating samples from the posterior distribution of neural
networks is a major challenge. One significant advance in that direction would
be the incorporation of adaptive step sizes, similar to modern neural network
optimizers, into Monte Carlo Markov chain sampling algorithms without
significantly increasing computational demand. Over the past years, several
papers have introduced sampling algorithms with claims that they achieve this
property. However, do they indeed converge to the correct distribution? In this
paper, we demonstrate that these methods can have a substantial bias in the
distribution they sample, even in the limit of vanishing step sizes and at full
batch size.
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve
This paper attempts to address the issue of posterior distribution sampling in Bayesian Neural Networks (BNNs). Specifically, the paper focuses on how to incorporate adaptive step size methods from modern neural network optimizers into Monte Carlo Markov Chain (MCMC) sampling algorithms without significantly increasing computational demands.
### Background
1. **Importance of Bayesian Neural Networks**:
- In many practical applications, such as medical imaging, it is necessary to assess the reliability of neural network predictions.
- Bayesian Neural Networks are a promising approach that can model uncertainty, improve robustness to outliers, and achieve high-quality uncertainty quantification.
2. **Sampling Challenges**:
- Generating samples from the Bayesian posterior distribution is a major challenge.
- Standard MCMC methods, while theoretically capable of converging to the true posterior distribution, are computationally expensive, especially in deep neural networks.
3. **Limitations of Existing Methods**:
- In recent years, some papers have proposed new sampling algorithms, claiming that these algorithms can achieve adaptive step sizes without significantly increasing computational demands.
- However, do these methods truly converge to the correct distribution? This is the main research question of this paper.
### Main Research Questions
1. **Do the proposed algorithms indeed converge to a distribution close to the true posterior?**
2. **What are the consequences of using adaptive step sizes without including correction terms in the dynamics?**
### Methods
1. **Theoretical Analysis**:
- By analyzing the ergodic properties of Stochastic Differential Equations (SDEs), the paper explores the convergence of adaptive step size methods.
- It points out that without correction terms, adaptive step size methods may introduce significant bias.
2. **Experimental Validation**:
- By empirically estimating the stationary densities of different algorithms, the paper validates the results of the theoretical analysis.
### Results
1. **Theoretical Results**:
- It is proven that existing adaptive step size methods will introduce significant bias in the limit, leading to a substantial difference between the sampling distribution and the target distribution.
- Specifically, these methods may produce deep local minima at the global maxima of the target distribution.
2. **Experimental Results**:
- Empirical results show significant differences between the stationary densities of different algorithms and the target distribution, validating the correctness of the theoretical analysis.
### Discussion and Outlook
1. **Limitations of Existing Methods**:
- Current methods significantly alter the stationary distribution when introducing adaptive step sizes, and thus cannot be considered true Bayesian posterior sampling algorithms.
- Although these methods may still perform well on certain tasks, users should be aware of the biases they introduce.
2. **Directions for Improvement**:
- The paper proposes methods to fix algorithms like PSGLD and SGRLD by rescaling correction terms, but these methods remain computationally complex.
- It emphasizes the difficulty of incorporating adaptive step size methods into diffusion sampling methods without increasing computational costs, requiring new ideas and methods.
### Summary
Through theoretical analysis and empirical validation, this paper reveals the shortcomings of existing adaptive step size methods in Bayesian Neural Network posterior distribution sampling and points out directions for future research.