Abstract:Bayesian methods hold significant promise for improving the uncertainty quantification ability and robustness of deep neural network models. Recent research has seen the investigation of a number of approximate Bayesian inference methods for deep neural networks, building on both the variational Bayesian and Markov chain Monte Carlo (MCMC) frameworks. A fundamental issue with MCMC methods is that the improvements they enable are obtained at the expense of increased computation time and model storage costs. In this paper, we investigate the potential of sparse network structures to flexibly trade-off model storage costs and inference run time against predictive performance and uncertainty quantification ability. We use stochastic gradient MCMC methods as the core Bayesian inference method and consider a variety of approaches for selecting sparse network structures. Surprisingly, our results show that certain classes of randomly selected substructures can perform as well as substructures derived from state-of-the-art iterative pruning methods while drastically reducing model training times.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of excessive computational time and model storage cost when using Markov Chain Monte Carlo (MCMC) methods in Bayesian Deep Learning (BDL). Specifically: 1. **High computational cost and storage requirements of MCMC methods**: - Although MCMC methods can provide an unbiased posterior representation of the model, their improvement comes at the cost of increased computational time and model storage cost. - For large - scale deep neural networks, the efficiency of MCMC decreases as the number of model parameters increases. 2. **Trade - off between model compactness and inference efficiency**: - The paper studies the impact of sparse network structures on prediction performance and uncertainty quantification ability while reducing model storage cost and inference running time. - Specifically, the author explores the effects of different methods for selecting sparse network structures, including iterative pruning and randomly selected substructures. 3. **Improving inference efficiency and model compression**: - By performing inference within sparse substructures, the author hopes to find a method to significantly reduce the demand for computational resources while maintaining prediction performance. - This is especially important for application scenarios that are sensitive to computational time and latency, such as embedded, mobile, and Internet of Things (IoT) applications. ### Main research questions - **Selection of sparse substructures**: How do different sparse substructure selection methods (such as iterative pruning, random layer - level masks, random global masks) affect the model's prediction performance, uncertainty, and inference efficiency? - **Trade - off between sparsity and performance**: How do model accuracy, negative log - likelihood (NLL), expected calibration error (ECE) and other indicators change under different sparsity rates? - **Training time and inference acceleration**: Can sparse substructures significantly reduce training time and inference time, especially in the case of high sparsity rates? ### Experimental design To verify the above problems, the paper conducted the following experiments: 1. **Performance of SGHMC on optimized sparse substructures**: - The performance of applying SGHMC on fully - connected models and sparse substructures obtained by iterative pruning (IP - SGHMC) and iterative pruning rewinding (IPR - SGHMC) methods was compared. 2. **Comparison between random sparse substructures and optimized sparse substructures**: - The effect of applying SGHMC on randomly generated sparse substructures (such as RLM(F) and RGM) was compared with that on optimized sparse substructures (such as IPR - SGHMC). Through these experiments, the paper reveals that certain types of randomly selected substructures can achieve performance comparable to advanced iterative pruning methods while significantly reducing model training time. This provides a new idea for efficient inference in Bayesian deep learning.

Impact of Parameter Sparsity on Stochastic Gradient MCMC Methods for Bayesian Deep Learning

Functional Stochastic Gradient MCMC for Bayesian Neural Networks

An adaptive Hessian approximated stochastic gradient MCMC method

Sparse Bayesian Neural Networks: Bridging Model and Parameter Uncertainty through Scalable Variational Inference

Bayesian sparsification for deep neural networks with Bayesian model reduction

Stochastic Subgradient MCMC Methods.

A fast asynchronous MCMC sampler for sparse Bayesian inference

Training Bayesian Neural Networks with Sparse Subspace Variational Inference

Stochastic Gradient MCMC for Nonlinear State Space Models

Efficient Probabilistic Latent Semantic Analysis with Sparsity Control

Leveraging Joint Sparsity in Hierarchical Bayesian Learning

Inference algorithms and learning theory for Bayesian sparse factor analysis

Dynamic sparsity on dynamic regression models

Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling

Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics

Variational Inference and Sparsity in High-Dimensional Deep Gaussian Mixture Models

Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

From Bayesian Sparsity to Gated Recurrent Nets

Bayesian neural networks via MCMC: a Python-based tutorial

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks