Impact of Parameter Sparsity on Stochastic Gradient MCMC Methods for Bayesian Deep Learning

Meet P. Vadera,Adam D. Cobb,Brian Jalaian,Benjamin M. Marlin
DOI: https://doi.org/10.48550/arXiv.2202.03770
2022-02-08
Abstract:Bayesian methods hold significant promise for improving the uncertainty quantification ability and robustness of deep neural network models. Recent research has seen the investigation of a number of approximate Bayesian inference methods for deep neural networks, building on both the variational Bayesian and Markov chain Monte Carlo (MCMC) frameworks. A fundamental issue with MCMC methods is that the improvements they enable are obtained at the expense of increased computation time and model storage costs. In this paper, we investigate the potential of sparse network structures to flexibly trade-off model storage costs and inference run time against predictive performance and uncertainty quantification ability. We use stochastic gradient MCMC methods as the core Bayesian inference method and consider a variety of approaches for selecting sparse network structures. Surprisingly, our results show that certain classes of randomly selected substructures can perform as well as substructures derived from state-of-the-art iterative pruning methods while drastically reducing model training times.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of excessive computational time and model storage cost when using Markov Chain Monte Carlo (MCMC) methods in Bayesian Deep Learning (BDL). Specifically: 1. **High computational cost and storage requirements of MCMC methods**: - Although MCMC methods can provide an unbiased posterior representation of the model, their improvement comes at the cost of increased computational time and model storage cost. - For large - scale deep neural networks, the efficiency of MCMC decreases as the number of model parameters increases. 2. **Trade - off between model compactness and inference efficiency**: - The paper studies the impact of sparse network structures on prediction performance and uncertainty quantification ability while reducing model storage cost and inference running time. - Specifically, the author explores the effects of different methods for selecting sparse network structures, including iterative pruning and randomly selected substructures. 3. **Improving inference efficiency and model compression**: - By performing inference within sparse substructures, the author hopes to find a method to significantly reduce the demand for computational resources while maintaining prediction performance. - This is especially important for application scenarios that are sensitive to computational time and latency, such as embedded, mobile, and Internet of Things (IoT) applications. ### Main research questions - **Selection of sparse substructures**: How do different sparse substructure selection methods (such as iterative pruning, random layer - level masks, random global masks) affect the model's prediction performance, uncertainty, and inference efficiency? - **Trade - off between sparsity and performance**: How do model accuracy, negative log - likelihood (NLL), expected calibration error (ECE) and other indicators change under different sparsity rates? - **Training time and inference acceleration**: Can sparse substructures significantly reduce training time and inference time, especially in the case of high sparsity rates? ### Experimental design To verify the above problems, the paper conducted the following experiments: 1. **Performance of SGHMC on optimized sparse substructures**: - The performance of applying SGHMC on fully - connected models and sparse substructures obtained by iterative pruning (IP - SGHMC) and iterative pruning rewinding (IPR - SGHMC) methods was compared. 2. **Comparison between random sparse substructures and optimized sparse substructures**: - The effect of applying SGHMC on randomly generated sparse substructures (such as RLM(F) and RGM) was compared with that on optimized sparse substructures (such as IPR - SGHMC). Through these experiments, the paper reveals that certain types of randomly selected substructures can achieve performance comparable to advanced iterative pruning methods while significantly reducing model training time. This provides a new idea for efficient inference in Bayesian deep learning.