Abstract:In recent years, a variety of accelerators on FPGAs have been proposed to speed up the convolutional neural network (CNN) in many domain-specific application fields. Besides, some optimization algorithms, such as fast algorithms and network sparsity, have greatly reduced the theoretical computational workload of CNN inference. There are currently a few accelerators on FPGAs that support both the fast Winograd algorithm (WinoA) and network sparsity to minimize the amount of computation. However, on the one hand, these architectures feed data into processing elements (PEs) in units of blocks, some boundary losses caused by sparse irregularities cannot be avoided. On the other hand, these works have not discussed the design space exploration under the sparse condition. In this article, we propose a novel accelerator called WINONN. We fully discuss the challenges faced by supporting WinoA, weight sparsity, and activation sparsity simultaneously. To minimize the online encoding overhead caused by activation sparsity, an efficient encoding format called multibit mask (MBM) is proposed. To handle the irregularities of sparse data, we proposed a novel Scatter-Compute-Gather method in hardware design, combined with a freely sliding buffer to achieve fine-grained data loading to minimize the boundary waste. Finally, we combine a theoretical analysis and experimental method to explore the design space, allowing WINONN to get the best performance on a specific FPGA. Our high scalability design enables us to deploy sparse Winograd accelerators on very small embedded FPGAs, which is not supported in previous works. The experimental results on VGG16 show that we achieve the highest digital signal processing unit (DSP) efficiency and highest energy efficiency compared with the state-of-the-art sparse architectures.

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

Spwa: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator On Fpgas

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

An Efficient Accelerator for Sparse Convolutional Neural Networks

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

A Winograd-Based CNN Accelerator with a Fine-Grained Regular Sparsity Pattern

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs

Work-in-Progress: A High-performance FPGA Accelerator for Sparse Neural Networks

An Efficient Sparse CNNs Accelerator on FPGA

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

Efficient Inference of Large-Scale and Lightweight Convolutional Neural Networks on FPGA

A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs

A Convolutional Neural Network Accelerator Based on FPGA

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks