Abstract:Deep convolutional neural networks (CNNs) have achieved remarkable performance at the cost of huge computation. As the CNN models become more complex and deeper, compressing CNNs to sparse by pruning the redundant connection in the networks has emerged as an attractive approach to reduce the amount of computation and memory requirement. On the other hand, FPGAs have been demonstrated to be an effective hardware platform to accelerate CNN inference. However, most existing FPGA accelerators focus on dense CNN models, which are inefficient when executing sparse models as most of the arithmetic operations involve addition and multiplication with zero operands. In this work, we propose an accelerator with software-hardware co-design for sparse CNNs on FPGAs. To efficiently deal with the irregular connections in the sparse convolutional layers, we propose a weight-oriented dataflow that exploits element-matrix multiplication as the key operation. Each weight is processed individually, which yields low decoding overhead. Then, we design an FPGA accelerator that features a tile look-up table (TLUT) and a channel multiplexer (CMUX). The TLUT is designed to match the index between sparse weights and input pixels. Using TLUT, the runtime decoding overhead is mitigated by using an efficient indexing operation. Moreover, we propose a weight layout to enable efficient on-chip memory access without conflicts. To cooperate with the weight layout, a CMUX is inserted to locate the address. Finally, we build a neural architecture search (NAS) engine that leverages the reconfigurability of FPGAs to generate an efficient CNN model and choose the optimal hardware design parameters. The experiments demonstrate that our accelerator can achieve 223.4-309.0 GOP/s for the modern CNNs on Xilinx ZCU102, which provides a 2.4x-12.9x speedup over previous dense CNN accelerators on FPGAs. Our FPGA-aware NAS approach shows 2x speedup over MobileNetV2 with 1.5% accuracy loss.

Spwa: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator On Fpgas

A Winograd-Based CNN Accelerator with a Fine-Grained Regular Sparsity Pattern

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

An algorithm/hardware co‐optimized method to accelerate CNNs with compressed convolutional weights on FPGA

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

An Efficient Accelerator for Sparse Convolutional Neural Networks

Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models

An Efficient CNN Accelerator for Pattern-Compressed Sparse Neural Networks on FPGA

An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference.

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

An Efficient Sparse CNNs Accelerator on FPGA

High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization

WPU: A FPGA-based Scalable, Efficient and Software/Hardware Co-design Deep Neural Network Inference Acceleration Processor