Abstract:This paper presents a Bayesian estimation procedure for single hidden-layer neural networks using $\ell_{1}$ controlled neuron weight vectors. We study the structure of the posterior density that makes it amenable to rapid sampling via Markov Chain Monte Carlo (MCMC), and statistical risk guarantees. Let the neural network have $K$ neurons with internal weights of dimension $d$ and fix the outer weights. With $N$ data observations, use a gain parameter or inverse temperature of $\beta$ in the posterior density. The posterior is intrinsically multimodal and not naturally suited to the rapid mixing of MCMC algorithms. For a continuous uniform prior over the $\ell_{1}$ ball, we demonstrate that the posterior density can be written as a mixture density where the mixture components are log-concave. Furthermore, when the number of parameters $Kd$ exceeds a constant times $(\beta N)^{2}\log(\beta N)$, the mixing distribution is also log-concave. Thus, neuron parameters can be sampled from the posterior by only sampling log-concave densities. For a discrete uniform prior restricted to a grid, we study the statistical risk (generalization error) of procedures based on the posterior. Using an inverse temperature that is a fractional power of $1/N$, $\beta = C \left[(\log d)/N\right]^{1/4}$, we demonstrate that notions of squared error are on the 4th root order $O(\left[(\log d)/N\right]^{1/4})$. If one further assumes independent Gaussian data with a variance $\sigma^{2} $ that matches the inverse temperature, $\beta = 1/\sigma^{2}$, we show Kullback divergence decays as an improved cube root power $O(\left[(\log d)/N\right]^{1/3})$. Future work aims to bridge the sampling ability of the continuous uniform prior with the risk control of the discrete uniform prior, resulting in a polynomial time Bayesian training algorithm for neural networks with statistical risk control.

Challenges in Markov chain Monte Carlo for Bayesian neural networks

Functional Stochastic Gradient MCMC for Bayesian Neural Networks

Variational Bayes Neural Network: Posterior Consistency, Classification Accuracy and Computational Challenges

Bayesian Neural Networks via MCMC: A Python-Based Tutorial

Complex-valued Bayesian parameter estimation via Markov chain Monte Carlo

What Are Bayesian Neural Network Posteriors Really Like?

Multilevel Bayesian Deep Neural Networks

Rapid Bayesian Computation and Estimation for Neural Networks via Mixture Distributions

Function-Space MCMC for Bayesian Wide Neural Networks

MCMC-Based Inference in the Era of Big Data: A Fundamental Analysis of the Convergence Complexity of High-Dimensional Chains

Data Subsampling for Bayesian Neural Networks

Dangers of Bayesian Model Averaging under Covariate Shift

MCMC-Net: Accelerating Markov Chain Monte Carlo with Neural Networks for Inverse Problems

Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling

Neural Langevin Dynamical Sampling

Log-Concave Coupling for Sampling Neural Net Posteriors

Deep Learning for Marginal Bayesian Posterior Inference with Recurrent Neural Networks

A simple introduction to Markov Chain Monte–Carlo sampling

A Conceptual Introduction to Markov Chain Monte Carlo Methods

Parallel Markov Chain Monte Carlo for Non-Gaussian Posterior Distributions