Abstract:Fine-grained sparse convolutional neural networks (CNNs) achieve a better trade-off between model accuracy and size than coarse-grained sparse CNNs. Due to irregular data structures and unbalanced computation loads, fine-grained sparse CNNs struggle to fully leverage the performance advantages of computation and storage on general-purpose edge hardware. However, existing custom sparse accelerators are designed from the perspective of emulating a balanced load by software or computational strategies, neglecting the exploration of the computing architecture’s adaptability and parallelism for fine-grained sparse models. To address these challenges, a cross-mesh NoC-based accelerator architecture is proposed. This architecture aligns with the irregular characteristics of fine-grained sparse CNN weights and enhances the spatio-temporal parallelism of fine-grained sparse CNNs. First, a sparse multiplier unit (SMU) array and an adder array are designed to enable parallel execution of convolution multiplication and accumulation operations. Then, element-wise unroll-based nonzero weight multiplication is mapped to the SMU array to provide more flexible spatial parallelism. A horizontal and vertical cross-mesh NoC is proposed for flexible dataflow scheduling between the SMU and adder arrays to further improve temporal parallelism. This architecture allows the multiplication and accumulation operations in convolution to be decoupled and pipelined with negligible latency. Finally, the proposed accelerator architecture is implemented on the ZU9EG platform. The experimental results show that the proposed accelerator achieves frame rates of 509.9, 249.3, 100.7, 48.4, and 168.9 frames per second (FPS) for AlexNet, VGG-16, ResNet-18, MobileNet-v2, and EfficientNet, respectively. Compared with related works, this accelerator achieves inference speed and energy efficiency improvements of 1.1 $\times$ $\sim$ 36.1 $\times$ and 2.4 $\times$ $\sim$ 13.4 $\times$ , respectively.

A Sparse Convolution Neural Network Accelerator with Bandwidth-Efficient Data Loopback Structure

A Flexible Sparsity-Aware Accelerator with High Sensitivity and Efficient Operation for Convolutional Neural Networks

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

An Efficient Accelerator for Sparse Convolutional Neural Networks

Design of Sparse Convolutional Neural Network Accelerator

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

An Efficient and Flexible Accelerator Design for Sparse Convolutional Neural Networks

A Low-Power Sparse Convolutional Neural Network Accelerator with Pre-Encoding Radix-4 Booth Multiplier

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

An Efficient Sparse CNNs Accelerator on FPGA

A Convolutional Spiking Neural Network Accelerator with the Sparsity-Aware Memory and Compressed Weights

FPGA Accelerator for CNN: an Exploration of the Kernel Structured Sparsity and Hybrid Arithmetic Computation

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering

An Efficient CNN Accelerator for Pattern-Compressed Sparse Neural Networks on FPGA

Eyelet: A Cross-Mesh NoC-Based Fine-Grained Sparse CNN Accelerator for Spatio-Temporal Parallel Computing Optimization

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

WRA-SS: A High-Performance Accelerator Integrating Winograd with Structured Sparsity for Convolutional Neural Networks

Work-in-Progress: A High-performance FPGA Accelerator for Sparse Neural Networks