Exploiting Near-Memory Processing Architectures for Bayesian Neural Networks Acceleration

Yinglin Zhao,Jianlei Yang,Xiaotao Jia,Xueyan Wang,Zhaohao Wang,Wang Kang,Youguang Zhang,Weisheng Zhao
DOI: https://doi.org/10.1109/ISVLSI.2019.00045
2019-01-01
Abstract:Bayesian inference is an effective approach to capture the model uncertainty as well as tackle the over-fitting problem in deep neural networks. Recently Bayesian neural networks (BNNs) are becoming more and more popular and have succeeded in many recognition tasks. However, the BNNs inference procedure requires numerous memory access operations due to the resulted sampling networks. In this paper, a near memory architecture is proposed for accelerating BNN inference by introducing additional memory units near the processing units. The near memory architecture could cache the frequently accessed data to reduce the data movement efficiently. Minimizing the expensive data movements between memory units and computation units contributes to cutting down the latency and energy consumption. Comparing with the traditional approach, the simulation results show that the proposed architecture reduces the energy consumption by 9% and achieves a 1:6 speedup at the cost of 4% area overhead.
What problem does this paper attempt to address?