A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

Weijie You,Chang Wu
DOI: https://doi.org/10.1145/3289602.3293945
2019-01-01
Abstract:Convolutional Neural Networks (CNNs) have been shown to be very useful in image recognition and other AI applications. CNNs are usually computationally intensive. To address the challenge of overwhelming calculation requirements, researchers have proposed network compression methods to reduce the number of synaptic weights and the amount of computations. In this paper, we propose an input row based sparse convolution neural network accelerator on FPGAs that performs sparse CNN computing efficiently. Similar to the DNNWEAVER architecture, our accelerator also uses two-level architecture hierarchy, with multiple Processing Units (PUs) and each PU comprises a set of basic Processing Elements (PEs). The number of PEs in a single PU and the number of PUs in a design are reconfigurable for different CNNs for best performance. Our architecture does not require the large multiplexer for data selection as needed in Cambricon-X, thus, is more suitable for larger accelerator designs for high performance. Besides, we propose a weight merging method to balance the computation load on different PUs to maximize the overall computation efficiency. For evaluation, we implement our design with 32 PUs and each with 14 PEs. When compared with the DNNWEAVER implementation for non-sparse VGG16 network, we get an overall performance of 3.6x speedup running at 100MHz on a Xilinx ZC706 board and reach the speed of 297 GOPS.
What problem does this paper attempt to address?