Abstract:<p>Embedded devices are common carriers for deploying inference networks, which leverage the customized accelerator to achieve the promised performance with strict resource constraints. In the inference of DNN, the sparsity existing in the activations and weights of every layer contributes massive non-effictive memory accesses and computing operations. The data compression is adopted as a data pruning methed for accelerator design, which eliminates the zero-valued data with a specific data packaging method. However, the data compression, in varying degrees, breaks the data regularity of the processing array DNN accelerators caculates with. The complexity of data access caused by irregular data organization will add extra control logic and decoding logic to compensate.</p><p>The accelerator architecture that supports sparsity can use the sophisticated memory access scheming and parallel on-chip decoder structure via an efficient data packaging method to balance memory-accessing and computing for acceleration. In this paper, we propose a flexible and highly parallel accelerator architecture that uses a quantitative data packaging method which is efficient and stable for different degree of sparsity and parallel optimization to explore the sparsity in DNN to achieve high performance with low energy consumption. The total DRAM accesses, performance and energy consumption of the proposed sparse architecture are evaluated with different inference networks. Experiments show that the DRAM accesses of the proposed efficient data packaging method is significantly lower than other commonly used sparse data compression storage methods, the improved performance and saved energy of the sparse accelerator architecture after adopting the optimization method proposed in this paper are up to 1.2x and 1.6x, respectively, over a comparably provisioned do not support sparsity accelerator. In addition, the accelerator architecture proposed has achieved energy efficiency and performance improvements of up to 1.70x and 1.56x,compared with the state-of-the-art architectures.</p>

DANNA: A Dimension-Aware Neural Network Accelerator for Unstructured Sparsity

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

Deep Neural Network Acceleration with Sparse Prediction Layers

Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

SPAT: FPGA-based Sparsity-Optimized Spiking Neural Network Training Accelerator with Temporal Parallel Dataflow

A Data-Driven Asynchronous Neural Network Accelerator

Work-in-Progress: A High-performance FPGA Accelerator for Sparse Neural Networks

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

A Hybrid Sparse-dense Defensive DNN Accelerator Architecture against Adversarial Example Attacks

An Efficient Accelerator for Sparse Convolutional Neural Networks

Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering

Accelerating Sparse DNNs Based on Tiled GEMM

An Energy-Efficient Spiking Neural Network Accelerator Based on Spatio-Temporal Redundancy Reduction

LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks

A Computing Efficient Hardware Architecture for Sparse Deep Neural Network Computing

AccSS3D: Accelerator for Spatially Sparse 3D DNNs