Abstract:<p>Embedded devices are common carriers for deploying inference networks, which leverage the customized accelerator to achieve the promised performance with strict resource constraints. In the inference of DNN, the sparsity existing in the activations and weights of every layer contributes massive non-effictive memory accesses and computing operations. The data compression is adopted as a data pruning methed for accelerator design, which eliminates the zero-valued data with a specific data packaging method. However, the data compression, in varying degrees, breaks the data regularity of the processing array DNN accelerators caculates with. The complexity of data access caused by irregular data organization will add extra control logic and decoding logic to compensate.</p><p>The accelerator architecture that supports sparsity can use the sophisticated memory access scheming and parallel on-chip decoder structure via an efficient data packaging method to balance memory-accessing and computing for acceleration. In this paper, we propose a flexible and highly parallel accelerator architecture that uses a quantitative data packaging method which is efficient and stable for different degree of sparsity and parallel optimization to explore the sparsity in DNN to achieve high performance with low energy consumption. The total DRAM accesses, performance and energy consumption of the proposed sparse architecture are evaluated with different inference networks. Experiments show that the DRAM accesses of the proposed efficient data packaging method is significantly lower than other commonly used sparse data compression storage methods, the improved performance and saved energy of the sparse accelerator architecture after adopting the optimization method proposed in this paper are up to 1.2x and 1.6x, respectively, over a comparably provisioned do not support sparsity accelerator. In addition, the accelerator architecture proposed has achieved energy efficiency and performance improvements of up to 1.70x and 1.56x,compared with the state-of-the-art architectures.</p>

Sparse neural network architecture and realization method thereof

A Computing Efficient Hardware Architecture for Sparse Deep Neural Network Computing

Highly Efficient Sparse Neural Network Computing - Hardware and Software Solutions.

Separable array-based reconfigurable accelerator and realization method thereof

Reconfigurable neural network acceleration method and architecture

Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks

Computation on Sparse Neural Networks: an Inspiration for Future Hardware

SNrram: an Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory.

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

Work-in-Progress: A High-performance FPGA Accelerator for Sparse Neural Networks

Memristive neural network circuit design based on locally competitive algorithm for sparse coding application

Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging

An Efficient Spiking Neural Network Accelerator with Sparse Weight.

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Cerebron: A Reconfigurable Architecture for Spatiotemporal Sparse Spiking Neural Networks

Twofold Sparsity: Joint Bit- and Network-Level Sparsity for Energy-Efficient Deep Neural Network Using RRAM Based Compute-In-Memory

A Theory of I/O-Efficient Sparse Neural Network Inference

Sparsity-Aware Non-Volatile Computing-In-Memory Macro with Analog Switch Array and Low-Resolution Current-Mode ADC.

An Event-driven Spiking Neural Network Accelerator with On-chip Sparse Weight

A Low-Cost Hardware-Friendly Spiking Neural Network Based on Binary MRAM Synapses, Accelerated Using In-Memory Computing

A A 22nm 0.43pJ/SOP Sparsity-Aware In-Memory Neuromorphic Computing System with Hybrid Spiking and Artificial Neural Network and Configurable Topology