Abstract:Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers proposed model compression algorithms using sparsification and quantization, along with specific hardware architecture designs, to accelerate various applications. However, the irregularity of memory access caused by the sparsity severely damages the regularity of intensive computation loops. Therefore, the architecture design for sparse neural networks is crucial to better software and hardware co-design for neural network applications. To face these challenges, this paper first analyzes the computation patterns of different NN structures and unify them into the form of sparse matrix-vector multiplication, sparse matrix-matrix multiplication, and element-wise multiplication. On the basis of the EIE which supports only the fully-connected network and recurrent neural network (RNN), we expand it to support the convolution neural network (CNN) using the input vector transform unit. This paper designs a multi-precision multiplier with supporting datapath, which makes the proposed architecture have a better acceleration effect in the low-bit quantization with the same hardware architecture. The proposed accelerator architecture can achieve the equivalent performance and energy efficiency up to 574.2 GOPS, 42.8 GOPS/W for CNN and 110.4 GOPS, 8.24 GOPS/W for RNN under 4-bit quantization on Xilinx XCKU115 FPGA running at 200MHz. And it is the state-of-the-art accelerator supporting CNN-RNN-based models like the long-term recurrent convolutional network with 571.1 GOPS performance and 42.6 GOPS/W energy efficiency under 4-bit data format.

Exploring the Granularity of Sparsity in Convolutional Neural Networks.

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

Joint Sparsity with Mixed Granularity for Efficient GPU Implementation

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

Exploring Fine-Grained Sparsity in Convolutional Neural Networks for Efficient Inference

SUBP: Soft Uniform Block Pruning for 1 X N Sparse CNNs Multithreading Acceleration

Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.

Reconfigurable Spatial-Parallel Stochastic Computing for Accelerating Sparse Convolutional Neural Networks

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Learning Sparse Patterns in Deep Neural Networks

Balanced Sparsity for Efficient DNN Inference on GPU

Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.

A Pre-defined Sparse Kernel Based Convolution for Deep CNNs

Intragroup Sparsity for Efficient Inference

The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks

Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

Effective Interplay between Sparsity and Quantization: From Theory to Practice

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

A Computing Efficient Hardware Architecture for Sparse Deep Neural Network Computing