Abstract:Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers proposed model compression algorithms using sparsification and quantization, along with specific hardware architecture designs, to accelerate various applications. However, the irregularity of memory access caused by the sparsity severely damages the regularity of intensive computation loops. Therefore, the architecture design for sparse neural networks is crucial to better software and hardware co-design for neural network applications. To face these challenges, this paper first analyzes the computation patterns of different NN structures and unify them into the form of sparse matrix-vector multiplication, sparse matrix-matrix multiplication, and element-wise multiplication. On the basis of the EIE which supports only the fully-connected network and recurrent neural network (RNN), we expand it to support the convolution neural network (CNN) using the input vector transform unit. This paper designs a multi-precision multiplier with supporting datapath, which makes the proposed architecture have a better acceleration effect in the low-bit quantization with the same hardware architecture. The proposed accelerator architecture can achieve the equivalent performance and energy efficiency up to 574.2 GOPS, 42.8 GOPS/W for CNN and 110.4 GOPS, 8.24 GOPS/W for RNN under 4-bit quantization on Xilinx XCKU115 FPGA running at 200MHz. And it is the state-of-the-art accelerator supporting CNN-RNN-based models like the long-term recurrent convolutional network with 571.1 GOPS performance and 42.6 GOPS/W energy efficiency under 4-bit data format.

A Precision-Scalable Energy-Efficient Bit-Split-and-Combination Vector Systolic Accelerator for NAS-Optimized DNNs on Edge

An Energy-Efficient Bit-Split-and-Combination Systolic Accelerator for NAS-Based Multi-Precision Convolution Neural Networks

An Energy-Efficient Mixed-Bitwidth Systolic Accelerator for NAS-Optimized Deep Neural Networks

A High Performance Multi-Bit-Width Booth Vector Systolic Accelerator for NAS Optimized Deep Learning Neural Networks

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

A Precision-Scalable Energy-Efficient Convolutional Neural Network Accelerator.

Enhancing the PE Utilization for Multi-Precision Systolic Array Via Optimizing Computation Latency

A Precision-Scalable Deep Neural Network Accelerator with Activation Sparsity Exploitation

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.

A Low-Power Sparse Convolutional Neural Network Accelerator with Pre-Encoding Radix-4 Booth Multiplier

Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning.

A 24.3 Μj/image SNN Accelerator for DVS-Gesture with WS-LOS Dataflow and Sparse Methods

Addressing the issue of processing element under-utilization in general-purpose systolic deep learning accelerators

A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing

A Systolic SNN Inference Accelerator and Its Co-optimized Software Framework

Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs

Energy-efficient Dense DNN Acceleration with Signed Bit-slice Architecture

Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators

NBSSN: A Neuromorphic Binary Single-Spike Neural Network for Efficient Edge Intelligence.

Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators