GCNTrain+: A Versatile and Efficient Accelerator for Graph Convolutional Neural Network Training
Zhuoran Song,Jiabei Long,Li Jiang,Naifeng Jing,Xiaoyao Liang
DOI: https://doi.org/10.1145/3705317
IF: 1.444
2024-01-01
ACM Transactions on Architecture and Code Optimization
Abstract:Recently, graph convolutional networks (GCNs) have gained wide attention due to their ability to capture node relationships in graphs. One problem appears when full-batch GCN is trained on large graph datasets, where the computational and memory requirements are unacceptable. To address this issue, mini-batch GCN training is introduced to improve the scalability of GCN training for large datasets by sampling and training only a subset of the graph in each batch. Although several acceleration techniques have been designed for boosting the efficiency of full-batch GCN, they lack attention to mini-batch GCN, which differs from full-batch GCN in terms of the sampled dynamic graph structures. Based on our previous work GCNTrain [28], which was originally excogitated for accelerating full-batch GCN training, we devise GCNTrain+—a universal accelerator to tackle the performance bottlenecks associated with both full-batch and mini-batch GCN training. GCNTrain+ is equipped with two engines to optimize computation and memory access in GCN training, respectively. To reduce the computation overhead, we propose to dynamically reconfigure the computation order based on the varying data dimensions involved in each training batch. Moreover, we build a unified computation engine to perform the sparse-dense matrix multiplications (SpDM) and sparse-sparse matrix multiplications (SpSpM) discovered in GCN training uniformly. To alleviate the memory burden, we devise a two-phased dynamic clustering mechanism to capture data locality as well as customized hardware to reduce the clustering overhead. We evaluate GCNTrain+ on seven datasets, and the result shows that GCNTrain+ achieves 136.0 ×, 52.6 ×, 2.2 ×, and 1.5 × speedup over CPU, GPU, GCNAX, and GCNTrain in full-batch GCN training. Additionally, GCNTrain+ outperforms them with speedups of 131.6 ×, 67.1 ×, 4.4 ×, and 1.5 × in mini-batch GCN training.