Abstract:As a new algorithm of graph embedding, graph neural networks (GNNs) have been widely used in many fields. However, GNN computing has the characteristics of both sparse graph processing and dense neural network, which make it difficult to be deployed efficiently on the existing graph processing accelerators or neural network accelerators. Recently, some GNN accelerators have been proposed, but the following challenges have not been fully solved: 1) the minibatch GNN inference scenario has the potential of software and hardware co-design, which can bring 30% computation amount reduction, and this is not well utilized. Besides, the cost of message flow graph construction is large and may account for more than 50% of the total delay; 2) the feature aggregation has a large amount of data access and relatively small amount of computation, which leads to low on-chip data reuse, only 10% of dense computing; and 3) without the optimization of sparse computing units, simple memory bank and cross bar architecture can easily lead to bank access conflict and load imbalance, reducing the utilization of computing units to less than 60%. In order to solve the above problems, we propose a algorithm-hardware co-design scheme to accelerate GNN inference, which includes three technologies: 1) a reuse-aware sampling method is proposed for minibatch inference scenarios, which reduces 30% of the calculation and improves the on-chip reusability of local data; 2) through the nodewise parallelism-aware quantization, the features and weights are quantized to integers with eight or four bits, which reduces the amount of memory access by at least four times; and 3) an accelerator supporting the above technologies is designed and evaluated, and different operations are supported by the sampling-inference integration architecture. The multibank on-chip memory pool is designed to support data reuse, and edge stream reordering is used to reduce data access conflicts, improving the utilization of computing units by 1.5x . Combined with the above technologies, the experiments show that our design achieves 9.2x speedup and 29x energy efficiency improvement compared with the Deep Graph Library framework running on servers equipped with CPU and GPU.

An Efficient GCN Accelerator Based on Workload Reorganization and Feature Reduction

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

HyGCN: A GCN Accelerator with Hybrid Architecture

LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks

CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling

MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization

GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

SPA-GCN: Efficient and Flexible GCN Accelerator with an Application for Graph Similarity Computation

Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms

Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators

VersaGNN: a Versatile accelerator for Graph neural networks

Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing

RFC-HyPGCN: A Runtime Sparse Feature Compress Accelerator for Skeleton-Based GCNs Action Recognition Model with Hybrid Pruning

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing