ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

Dengke Han,Meng Wu,Runzhen Xue,Mingyu Yan,Xiaochun Ye,Dongrui Fan
2024-06-03
Abstract:Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention disparity from source vertices towards a common target vertex unveils an opportunity to boost the model execution performance by pruning unimportant source vertices during neighbor aggregation. In this study, we commence with a quantitative analysis of the attention disparity in HGNN models, where the importance of different source vertices varies for the same target vertex. To fully exploit this finding for inference acceleration, we propose a runtime pruning method based on min-heap and map it to a dedicated hardware pruner to discard unimportant vertices. Given that the pruning overhead itself is non-negligible and cannot be amortized by conventional staged execution paradigm, an operation-fusion execution fow of HGNNs is introduced to overlap the pruning overhead while harnessing inter-stage parallelism. Finally, we present the design of a novel HGNN accelerator, ADE-HGNN, tailored to support the proposed execution framework. Our experimental results demonstrate that ADE-HGNN achieves an average performance improvement of 28.21x over the NVIDIA GPU T4 platform and 7.98x over the advanced GPU A100, with the inference accuracy loss kept within a negligible range of 0.11%~1.47%. Furthermore, ADE-HGNN significantly reduces energy consumption to 1.97% and 5.37% of the two platforms, respectively.
Hardware Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to accelerate model execution by exploiting attention differences while keeping the inference accuracy loss within an acceptable range when dealing with Heterogeneous Graph Neural Networks (HGNNs). Specifically, the paper focuses on the following aspects: 1. **High computational complexity of the attention mechanism**: Most mainstream HGNN models significantly improve the model's accuracy by introducing the attention mechanism, but at the same time, they also increase the computational complexity and memory bandwidth requirements. This leads to a decline in model execution efficiency. 2. **Exploitation of attention differences**: The paper finds that there are differences in attention values from source vertices to the same target vertex, that is, different source vertices have different importance to the same target vertex. By selectively removing relatively unimportant neighbor vertices, the model execution performance can be greatly improved while keeping the inference accuracy loss negligible. 3. **Limitations of existing platforms**: The existing general - purpose hardware architectures and phased - execution paradigms face significant challenges when performing neighbor - vertex pruning because the pruning itself incurs additional overhead, and this overhead cannot be amortized by the traditional phased - execution paradigm. To address the above challenges, the paper proposes the following solutions: - **Runtime neighbor - pruning method based on a minimum heap**: Inefficiently remove unimportant neighbor vertices by using the minimum - heap data structure. - **Execution flow of operation fusion**: Amortize the pruning overhead by fusing different operations and exploit the parallelism between stages. - **New HGNN accelerator design**: Design a new accelerator named ADE - HGNN to support the above - optimized execution flow. Experimental results show that, compared with the NVIDIA GPU T4 platform, ADE - HGNN has an average performance improvement of 28.21 times, and compared with the advanced GPU A100 platform, it has an average performance improvement of 7.98 times, while the inference accuracy loss is maintained between 0.11% and 1.47%. In addition, ADE - HGNN significantly reduces energy consumption, which is reduced to 1.97% of the T4 platform and 5.37% of the A100 platform respectively.