Abstract:<p>Embedded devices are common carriers for deploying inference networks, which leverage the customized accelerator to achieve the promised performance with strict resource constraints. In the inference of DNN, the sparsity existing in the activations and weights of every layer contributes massive non-effictive memory accesses and computing operations. The data compression is adopted as a data pruning methed for accelerator design, which eliminates the zero-valued data with a specific data packaging method. However, the data compression, in varying degrees, breaks the data regularity of the processing array DNN accelerators caculates with. The complexity of data access caused by irregular data organization will add extra control logic and decoding logic to compensate.</p><p>The accelerator architecture that supports sparsity can use the sophisticated memory access scheming and parallel on-chip decoder structure via an efficient data packaging method to balance memory-accessing and computing for acceleration. In this paper, we propose a flexible and highly parallel accelerator architecture that uses a quantitative data packaging method which is efficient and stable for different degree of sparsity and parallel optimization to explore the sparsity in DNN to achieve high performance with low energy consumption. The total DRAM accesses, performance and energy consumption of the proposed sparse architecture are evaluated with different inference networks. Experiments show that the DRAM accesses of the proposed efficient data packaging method is significantly lower than other commonly used sparse data compression storage methods, the improved performance and saved energy of the sparse accelerator architecture after adopting the optimization method proposed in this paper are up to 1.2x and 1.6x, respectively, over a comparably provisioned do not support sparsity accelerator. In addition, the accelerator architecture proposed has achieved energy efficiency and performance improvements of up to 1.70x and 1.56x,compared with the state-of-the-art architectures.</p>

An Energy-Efficient Differential Frame Convolutional Accelerator with On-Chip Fusion Storage Architecture and Pixel-Level Pipeline Data Flow

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

EWS: an Energy-Efficient CNN Accelerator with Enhanced Weight Stationary Dataflow

A 65-Nm Energy-Efficient Interframe Data Reuse Neural Network Accelerator for Video Applications

An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator

An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering

An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture.

Relative Indexed Compressed Sparse Filter Encoding Format for Hardware-Oriented Acceleration of Deep Convolutional Neural Networks

Energy-Efficient Accelerator Design for Deformable Convolution Networks

A Low-Power Sparse Convolutional Neural Network Accelerator with Pre-Encoding Radix-4 Booth Multiplier

A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs

An Efficient Accelerator for Sparse Convolutional Neural Networks

A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow

Cambricon-D: Full-Network Differential Acceleration for Diffusion Models

Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging

A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network

A Sparse-Adaptive CNN Processor with Area/Performance balanced N-Way Set-Associate PE Arrays Assisted by a Collision-Aware Scheduler

SF-MMCN: Low-Power Sever Flow Multi-Mode Diffusion Model Accelerator

A High-Efficient and Configurable Hardware Accelerator for Convolutional Neural Network