Abstract:<p>Embedded devices are common carriers for deploying inference networks, which leverage the customized accelerator to achieve the promised performance with strict resource constraints. In the inference of DNN, the sparsity existing in the activations and weights of every layer contributes massive non-effictive memory accesses and computing operations. The data compression is adopted as a data pruning methed for accelerator design, which eliminates the zero-valued data with a specific data packaging method. However, the data compression, in varying degrees, breaks the data regularity of the processing array DNN accelerators caculates with. The complexity of data access caused by irregular data organization will add extra control logic and decoding logic to compensate.</p><p>The accelerator architecture that supports sparsity can use the sophisticated memory access scheming and parallel on-chip decoder structure via an efficient data packaging method to balance memory-accessing and computing for acceleration. In this paper, we propose a flexible and highly parallel accelerator architecture that uses a quantitative data packaging method which is efficient and stable for different degree of sparsity and parallel optimization to explore the sparsity in DNN to achieve high performance with low energy consumption. The total DRAM accesses, performance and energy consumption of the proposed sparse architecture are evaluated with different inference networks. Experiments show that the DRAM accesses of the proposed efficient data packaging method is significantly lower than other commonly used sparse data compression storage methods, the improved performance and saved energy of the sparse accelerator architecture after adopting the optimization method proposed in this paper are up to 1.2x and 1.6x, respectively, over a comparably provisioned do not support sparsity accelerator. In addition, the accelerator architecture proposed has achieved energy efficiency and performance improvements of up to 1.70x and 1.56x,compared with the state-of-the-art architectures.</p>

Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators.

A design framework for processing-in-memory accelerator

An Energy-Efficient In-Memory Accelerator for Graph Construction and Updating

Exploring Memory Access Patterns for Graph Processing Accelerators

A Case for In-Memory Random Scatter-Gather for Fast Graph Processing

TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing

GraphIA: an In-Situ Accelerator for Large-Scale Graph Processing.

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing

Alleviating Irregularity in Graph Analytics Acceleration

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

GraphR: Accelerating Graph Processing Using ReRAM

EMS: Efficient Memory Subsystem Synthesis for Spatial Accelerators

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

Skyway: Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data Management

Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging

Memory-Bound Proof-of-Work Acceleration for Blockchain Applications

HyVE: Hybrid Vertex-Edge Memory Hierarchy for Energy-Efficient Graph Processing.

ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators

SoGraph: A State-Aware Architecture for Out-of-Memory Graph Processing on HBM-Equipped FPGAs

Memory Access Optimization of a Neural Network Accelerator Based on Memory Controller