A High-Performance Hardware Accelerator for Sparse Convolutional Neural Network on FPGA

Jianwei Zhang,Fen Xu,Jinghu Li
DOI: https://doi.org/10.1109/iccc56324.2022.10065907
2022-01-01
Abstract:Recently convolutional neural networks (CNNs) have developed rapidly and shown excellent performance in a wide range of applications. The design of hardware accelerators for convolutional neural networks has attracted great attention. With the number of layers in a CNN network deepens, the amount of model computation increases. The high computation and memory requirements can limit the deployment of CNNs on resource-limited hardware accelerators such as FPGAs, and it is urgent to use pruning technology to compress network models for easy deployment on FPGAs. In this paper, we use a sparse way based on the combination of convolutional kernel pruning and weight pruning to compress CNN models, together with 16-bit fixed-point quantization, skip zero-value processing, and independent multiply-and-accumulate (MAC) modules to implement a FPGA-based high-performance hardware accelerator design for sparse convolutional neural networks. The LeNet-5 network is implemented as an example on the Xilinx Virtex VC-707 FPGA board, and the experimental result from inference of 10,000 images of the MNIST test set shows that when the network sparsity is 87.46%, the inference time is 82.43% less than the CPU, the energy efficiency ratio is 264.85 times that implemented on the CPU.
What problem does this paper attempt to address?