Abstract:Achieving robust uncertainty quantification for deep neural networks represents an important requirement in many real-world applications of deep learning such as medical imaging where it is necessary to assess the reliability of a neural network's prediction. Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks. Unfortunately, generating samples from the posterior distribution of neural networks is a major challenge. One significant advance in that direction would be the incorporation of adaptive step sizes, similar to modern neural network optimizers, into Monte Carlo Markov chain sampling algorithms without significantly increasing computational demand. Over the past years, several papers have introduced sampling algorithms with claims that they achieve this property. However, do they indeed converge to the correct distribution? In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper attempts to address the issue of posterior distribution sampling in Bayesian Neural Networks (BNNs). Specifically, the paper focuses on how to incorporate adaptive step size methods from modern neural network optimizers into Monte Carlo Markov Chain (MCMC) sampling algorithms without significantly increasing computational demands. ### Background 1. **Importance of Bayesian Neural Networks**: - In many practical applications, such as medical imaging, it is necessary to assess the reliability of neural network predictions. - Bayesian Neural Networks are a promising approach that can model uncertainty, improve robustness to outliers, and achieve high-quality uncertainty quantification. 2. **Sampling Challenges**: - Generating samples from the Bayesian posterior distribution is a major challenge. - Standard MCMC methods, while theoretically capable of converging to the true posterior distribution, are computationally expensive, especially in deep neural networks. 3. **Limitations of Existing Methods**: - In recent years, some papers have proposed new sampling algorithms, claiming that these algorithms can achieve adaptive step sizes without significantly increasing computational demands. - However, do these methods truly converge to the correct distribution? This is the main research question of this paper. ### Main Research Questions 1. **Do the proposed algorithms indeed converge to a distribution close to the true posterior?** 2. **What are the consequences of using adaptive step sizes without including correction terms in the dynamics?** ### Methods 1. **Theoretical Analysis**: - By analyzing the ergodic properties of Stochastic Differential Equations (SDEs), the paper explores the convergence of adaptive step size methods. - It points out that without correction terms, adaptive step size methods may introduce significant bias. 2. **Experimental Validation**: - By empirically estimating the stationary densities of different algorithms, the paper validates the results of the theoretical analysis. ### Results 1. **Theoretical Results**: - It is proven that existing adaptive step size methods will introduce significant bias in the limit, leading to a substantial difference between the sampling distribution and the target distribution. - Specifically, these methods may produce deep local minima at the global maxima of the target distribution. 2. **Experimental Results**: - Empirical results show significant differences between the stationary densities of different algorithms and the target distribution, validating the correctness of the theoretical analysis. ### Discussion and Outlook 1. **Limitations of Existing Methods**: - Current methods significantly alter the stationary distribution when introducing adaptive step sizes, and thus cannot be considered true Bayesian posterior sampling algorithms. - Although these methods may still perform well on certain tasks, users should be aware of the biases they introduce. 2. **Directions for Improvement**: - The paper proposes methods to fix algorithms like PSGLD and SGRLD by rescaling correction terms, but these methods remain computationally complex. - It emphasizes the difficulty of incorporating adaptive step size methods into diffusion sampling methods without increasing computational costs, requiring new ideas and methods. ### Summary Through theoretical analysis and empirical validation, this paper reveals the shortcomings of existing adaptive step size methods in Bayesian Neural Network posterior distribution sampling and points out directions for future research.

On the Convergence of Locally Adaptive and Scalable Diffusion-Based Sampling Methods for Deep Bayesian Neural Network Posteriors

Log-Concave Coupling for Sampling Neural Net Posteriors

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

Posterior Sampling in High Dimension via Diffusion Processes

Bayesian computation with generative diffusion models by Multilevel Monte Carlo

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap

Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics

Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate Modeling and Uncertainty Quantification

Neural Langevin Dynamical Sampling

A Comprehensive Study of Calibration and Uncertainty Quantification for Bayesian Convolutional Neural Networks - An Application to Seismic Data

High Accuracy Uncertainty-Aware Interatomic Force Modeling with Equivariant Bayesian Neural Networks

Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling

Calibration and Uncertainty Quantification of Bayesian Convolutional Neural Networks for Geophysical Applications

Uncertainty propagation for dropout-based Bayesian neural networks

Posterior sampling via Langevin dynamics based on generative priors

Statistical guarantees for stochastic Metropolis-Hastings

Sampling Methods for Bayesian Inference Involving Convergent Noisy Approximations of Forward Maps

Scalable Bayesian Learning with posteriors