Abstract:Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs).

An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

A dedicated hardware accelerator for real-time acceleration of YOLOv2

An FPGA-Based Reconfigurable CNN Accelerator for YOLO

An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Optimizing Convolutional Neural Network Accelerator on Low-Cost FPGA

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

A High Energy Efficiency and Low Resource Consumption FPGA Accelerator for Convolutional Neural Network

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

FPGA-based Accelerator for Convolutional Neural Network

FPGA Hardware Acceleration Design for Deep Learning

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

Spwa: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator On Fpgas

Design Implementation of FPGA-Based Neural Network Acceleration

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs

An OpenCL-Based FPGA Accelerator for Faster R-CNN

Instruction driven cross-layer CNN accelerator with winograd transformation on FPGA

An Efficient Convolutional Neural Network Accelerator on FPGA