ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

Dengke Han,Meng Wu,Runzhen Xue,Mingyu Yan,Xiaochun Ye,Dongrui Fan

2024-06-03

Abstract:Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention disparity from source vertices towards a common target vertex unveils an opportunity to boost the model execution performance by pruning unimportant source vertices during neighbor aggregation. In this study, we commence with a quantitative analysis of the attention disparity in HGNN models, where the importance of different source vertices varies for the same target vertex. To fully exploit this finding for inference acceleration, we propose a runtime pruning method based on min-heap and map it to a dedicated hardware pruner to discard unimportant vertices. Given that the pruning overhead itself is non-negligible and cannot be amortized by conventional staged execution paradigm, an operation-fusion execution fow of HGNNs is introduced to overlap the pruning overhead while harnessing inter-stage parallelism. Finally, we present the design of a novel HGNN accelerator, ADE-HGNN, tailored to support the proposed execution framework. Our experimental results demonstrate that ADE-HGNN achieves an average performance improvement of 28.21x over the NVIDIA GPU T4 platform and 7.98x over the advanced GPU A100, with the inference accuracy loss kept within a negligible range of 0.11%~1.47%. Furthermore, ADE-HGNN significantly reduces energy consumption to 1.97% and 5.37% of the two platforms, respectively.

Hardware Architecture

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to accelerate model execution by exploiting attention differences while keeping the inference accuracy loss within an acceptable range when dealing with Heterogeneous Graph Neural Networks (HGNNs). Specifically, the paper focuses on the following aspects: 1. **High computational complexity of the attention mechanism**: Most mainstream HGNN models significantly improve the model's accuracy by introducing the attention mechanism, but at the same time, they also increase the computational complexity and memory bandwidth requirements. This leads to a decline in model execution efficiency. 2. **Exploitation of attention differences**: The paper finds that there are differences in attention values from source vertices to the same target vertex, that is, different source vertices have different importance to the same target vertex. By selectively removing relatively unimportant neighbor vertices, the model execution performance can be greatly improved while keeping the inference accuracy loss negligible. 3. **Limitations of existing platforms**: The existing general - purpose hardware architectures and phased - execution paradigms face significant challenges when performing neighbor - vertex pruning because the pruning itself incurs additional overhead, and this overhead cannot be amortized by the traditional phased - execution paradigm. To address the above challenges, the paper proposes the following solutions: - **Runtime neighbor - pruning method based on a minimum heap**: Inefficiently remove unimportant neighbor vertices by using the minimum - heap data structure. - **Execution flow of operation fusion**: Amortize the pruning overhead by fusing different operations and exploit the parallelism between stages. - **New HGNN accelerator design**: Design a new accelerator named ADE - HGNN to support the above - optimized execution flow. Experimental results show that, compared with the NVIDIA GPU T4 platform, ADE - HGNN has an average performance improvement of 28.21 times, and compared with the advanced GPU A100 platform, it has an average performance improvement of 7.98 times, while the inference accuracy loss is maintained between 0.11% and 1.47%. In addition, ADE - HGNN significantly reduces energy consumption, which is reduced to 1.97% of the T4 platform and 5.37% of the A100 platform respectively.

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation

GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling

SiHGNN: Leveraging Properties of Semantic Graphs for Efficient HGNN Acceleration

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization

Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms

HGAMLP: Heterogeneous Graph Attention MLP with De-Redundancy Mechanism

HGNAS: Hardware-Aware Graph Neural Architecture Search for Edge Devices

Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

Advancing Graph Neural Networks with HL-HGAT: A Hodge-Laplacian and Attention Mechanism Approach for Heterogeneous Graph-Structured Data

NTGAT: A Graph Attention Network Accelerator with Runtime Node Tailoring

RAHP: A Redundancy-aware Accelerator for High-performance Hypergraph Neural Network

CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning

AdaptGear: Accelerating GNN Training Via Adaptive Subgraph-Level Kernels on GPUs

Algorithm/Hardware Co-Optimization for Sparsity-Aware SpMM Acceleration of GNNs

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture