An Efficient Channel-Aware Sparse Binarized Neural Networks Inference Accelerator

Qingliang Liu,Jinmei Lai,Jiabao Gao
DOI: https://doi.org/10.1109/tcsii.2021.3119369
2021-01-01
IEEE Transactions on Circuits & Systems II Express Briefs
Abstract:The binarized neural network (BNN) inference accelerators show great promise in cost- and power-restricted domains. However, the performances of these accelerators are still severely limited by the significant redundancies in BNNs inference. In this brief, we introduce channel-aware sparse accelerator (CAA) to alleviate the performance degradations induced by the redundancies in BNNs while maintaining original accuracies. First, motivated by the observation that the convolution processes of our rebuilt rectangle kernels contain many redundant operations which can be skipped by exploiting the BNN-specific property, we convert the entire original XNOR-popcount convolutions of each neuron into channel-aware-popcount (CAP) operations for all binarized convolutional and fully-connected layers in CAA by employing rectangle kernel simplification strategy and eliminate the unnecessary operations. Meanwhile, these CAP operations can be implemented to directly gain the final output without any extra steps. Furthermore, inspired by our new observations on two specific kinds of properties of the CAP operations, we adopt group pruning approach to save the remaining redundant CAP operations. Experimental results show that our design evaluated on an embedded FPGA achieves 4.2- $6.6{\times }$ inference-speedup, 3.6- $5.5{\times }$ energy-efficiency enhancement, and $1.35{\times }$ resource-efficiency improvement compared with state-of-the-art works.
What problem does this paper attempt to address?