A Flexible Sparsity-Aware Accelerator with High Sensitivity and Efficient Operation for Convolutional Neural Networks

Yuan, Haiying,Zeng, Zhiyong,Cheng, Junpeng,Li, Minghao
DOI: https://doi.org/10.1007/s00034-022-01992-x
2022-01-01
Abstract:In view of the technical challenge that convolutional neural networks involve in a large amount of computation caused by the information redundancy of the interlayer activation, a flexible sparsity-aware accelerator is proposed in this paper. It realizes the basic data transmission with coarse-grained control and realizes the transmission of sparse data with fine-grained control. In addition, the corresponding data arrangement scheme is designed to fully utilize the off-chip bandwidth. In order to improve the inference performance without accuracy reduction, the sparse activation is compressed to eliminate ineffectual activation while preserving topology information with the sparsity perceptron module. To improve power efficiency, the computational load is rationally allocated for multiplication accumulator array, and the convolution operation is decoupled by adder tree with FIFO. The accelerator is implemented on Xilinx VCU108, and 97.27% of the operations are non-zero activation operations. The accelerator running in sparsity mode is more than 2.5 times faster than that in density mode, and power consumption is reduced to 8.3 W. Furthermore, this flexible sparsity-aware accelerator architecture can be widely applied to large-scale deep convolutional neural networks.
What problem does this paper attempt to address?