RT-GNN: Accelerating Sparse Graph Neural Networks by Tensor-CUDA Kernel Fusion
Jianrong Yan,Wenbin Jiang,Dongao He,Suyang Wen,Yang Li,Hai Jin,Zhiyuan Shao
DOI: https://doi.org/10.1145/3702001
IF: 1.444
2024-01-01
ACM Transactions on Architecture and Code Optimization
Abstract:Graph Neural Networks (GNNs) have achieved remarkable successes in various graph-based learning tasks, thanks to their ability to leverage advanced GPUs. However, GNNs currently face challenges arising from the concurrent use of advanced Tensor Cores (TCs) and CUDA Cores (CDs) in GPUs. These challenges are further exacerbated due to repeated, inefficient, and redundant aggregations in GNN that result from the high sparsity and irregular non-zero distribution of real-world graphs. We propose RT-GNN, a GNN framework based on the fusion of advanced TC and CD units, to eliminate the aforementioned redundancies by exploiting the properties of an adjacency matrix. First, a novel GNN representation technique, hierarchical embedding graph (HEG) is proposed to manage the intermediate aggregation results hierarchically, which can further avoid redundancy in intermediate aggregations elegantly. Next, to address the inherent sparsity of graphs, RT-GNN places the blocks (a.k.a tiles) in HEG onto TCs and CDs according to their sparsity by a new block-based row-wise multiplication approach, which assembles TCs and CDs to work concurrently. Experimental results demonstrate that HEG outperforms HAG by an average speedup of 19.3 × for redundancy elimination performance, especially up to 72 × speedup on the dataset of ARXIV. Moreover, for overall performance, RT-GNN outperforms state-of-the-art GNN frameworks (including DGL, HAG, GNNAdvisor, and TC-GNN) by an average factor of 3.1 × while maintaining or even improving the task accuracy.