Abstract:Convolution neural networks (CNNs) as one of today’s main flavor of deep learning techniques dominate in various image recognition tasks. As the model size of modern CNNs continues to grow, neural network compression techniques have been proposed to prune the redundant neurons and synapses. However, prior techniques disconnect the software neural networks compression and hardware acceleration, which fail to balance multiple design parameters, including sparsity, performance, hardware area cost, and efficiency. More concretely, prior unstructured pruning techniques achieve high sparsity at the expense of extra performance overhead, while prior structured pruning techniques relying on strict sparse patterns lead to low sparsity and extra hardware cost. In this article, we propose OMNI, a framework for accelerating sparse CNNs on hardware accelerators. The innovation of OMNI stems from that it uses hardware amenable on-chip memory partition patterns to seamlessly engage the software CNN model compression and hardware CNN acceleration. To accelerate the compute-intensive convolution kernel, a promising hardware optimization approach is memory partition, which divides the original weight kernels into several groups so that the different hardware processing elements can simultaneously access the weight. We exploit the memory partition patterns including block, cyclic, or hybrid as a means of CNN compression patterns. Our software CNN model compression balances the sparsity across different groups and our hardware accelerator employs hardware parallelization coordinately with the sparse patterns, leading to a desirable compromise between sparsity and performance. We further develop performance models to help the designers to quickly identify the pattern factors subject to an area constraint. Last, we evaluate our design on application specific integrated circuit (ASIC) and field-programmable gate array (FPGA) platform. Experiments demonstrate that OMNI achieves $3.4\times $ – $6.2\times $ speedup for the modern CNNs, over a comparably ideal dense CNN accelerator. OMNI shows $114.7\times $ energy efficiency improvement compared with GPU platform. OMNI is also evaluated on Xilinx ZC706 and ZCU102 FPGA platforms, achieving 41.5 GOP/s and 125.3 GOP/s, respectively.

SpWMM: A High-Performance Sparse-Winograd Matrix-Matrix Multiplication Accelerator for CNNs.

EWS: an Energy-Efficient CNN Accelerator with Enhanced Weight Stationary Dataflow

Deep Neural Network Acceleration with Sparse Prediction Layers

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

Spwa: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator On Fpgas

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

A Winograd-Based CNN Accelerator with a Fine-Grained Regular Sparsity Pattern

ALSCA: A Large-Scale Sparse CNN Accelerator Using Position-First Dataflow and Input Channel Merging Approach

Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

An Efficient CNN Accelerator for Pattern-Compressed Sparse Neural Networks on FPGA

Reconfigurable Spatial-Parallel Stochastic Computing for Accelerating Sparse Convolutional Neural Networks

An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Efficient Layer-Wise N: M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters

A High-Performance Systolic Array Accelerator Dedicated for CNN.

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Algorithm/Hardware Co-Optimization for Sparsity-Aware SpMM Acceleration of GNNs

Adaptive Pixel-wise Structured Sparse Network for Efficient CNNs