Abstract:Fine-grained sparse convolutional neural networks (CNNs) achieve a better trade-off between model accuracy and size than coarse-grained sparse CNNs. Due to irregular data structures and unbalanced computation loads, fine-grained sparse CNNs struggle to fully leverage the performance advantages of computation and storage on general-purpose edge hardware. However, existing custom sparse accelerators are designed from the perspective of emulating a balanced load by software or computational strategies, neglecting the exploration of the computing architecture’s adaptability and parallelism for fine-grained sparse models. To address these challenges, a cross-mesh NoC-based accelerator architecture is proposed. This architecture aligns with the irregular characteristics of fine-grained sparse CNN weights and enhances the spatio-temporal parallelism of fine-grained sparse CNNs. First, a sparse multiplier unit (SMU) array and an adder array are designed to enable parallel execution of convolution multiplication and accumulation operations. Then, element-wise unroll-based nonzero weight multiplication is mapped to the SMU array to provide more flexible spatial parallelism. A horizontal and vertical cross-mesh NoC is proposed for flexible dataflow scheduling between the SMU and adder arrays to further improve temporal parallelism. This architecture allows the multiplication and accumulation operations in convolution to be decoupled and pipelined with negligible latency. Finally, the proposed accelerator architecture is implemented on the ZU9EG platform. The experimental results show that the proposed accelerator achieves frame rates of 509.9, 249.3, 100.7, 48.4, and 168.9 frames per second (FPS) for AlexNet, VGG-16, ResNet-18, MobileNet-v2, and EfficientNet, respectively. Compared with related works, this accelerator achieves inference speed and energy efficiency improvements of 1.1 $\times$ $\sim$ 36.1 $\times$ and 2.4 $\times$ $\sim$ 13.4 $\times$ , respectively.

Design of Sparse Convolutional Neural Network Accelerator

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

Accelerator for Sparse Convolutional Neural Networks Based on Shift Units

An Efficient Accelerator for Sparse Convolutional Neural Networks

A Low-Power Sparse Convolutional Neural Network Accelerator with Pre-Encoding Radix-4 Booth Multiplier

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

A Convolutional Spiking Neural Network Accelerator with the Sparsity-Aware Memory and Compressed Weights

An Energy-Efficient Spiking Neural Network Accelerator Based on Spatio-Temporal Redundancy Reduction

Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering

An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

An Event-driven Spiking Neural Network Accelerator with On-chip Sparse Weight

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

An Efficient Spiking Neural Network Accelerator with Sparse Weight.

PULSE: Parametric Hardware Units for Low-power Sparsity-Aware Convolution Engine

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

Work-in-Progress: A High-performance FPGA Accelerator for Sparse Neural Networks

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

A 3D Tiled Low Power Accelerator for Convolutional Neural Network

A Systolic Array-Based Scheduling Strategy for Sparse CNN Accelerators

Eyelet: A Cross-Mesh NoC-Based Fine-Grained Sparse CNN Accelerator for Spatio-Temporal Parallel Computing Optimization