Accelerating Backward Aggregation in GCN Training With Execution Path Preparing on GPUs

Shaoxian Xu,Zhiyuan Shao,Ci Yang,Xiaofei Liao,Hai Jin
DOI: https://doi.org/10.1109/tpds.2022.3205642
IF: 5.3
2022-10-08
IEEE Transactions on Parallel and Distributed Systems
Abstract:The emerging Graph Convolutional Network (GCN) has been widely used in many domains, where it is important to improve the efficiencies of applications by accelerating GCN trainings. Due to the sparsity nature and exploding scales of input real-world graphs, state-of-the-art GCN training systems (e.g., GNNAdvisor) employ graph processing techniques to accelerate the message exchanging (i.e., aggregations) among the graph vertices. Nevertheless, these systems treat both the aggregation stages of forward and backward propagation phases as all-active graph processing procedures that indiscriminately conduct computations on all vertices of an input graph. In this article, we first point out that in a GCN training problem with a given training set on an input graph, its aggregation stages of backward propagation phases (called as backward aggregations in this article) can be equivalently converted to partially-active graph processing procedures, which conduct computations on only partial vertices of the input graph. By leveraging such a finding, we propose an execution path preparing method that collects and coalesces the graph data used during different training layers of backward aggregations, and constructs their corresponding sub-graphs (called as execution paths in this article) as inputs to conduct the backward training on GPUs. Further, we propose a structural-aware strategy for the execution paths to compute their optimal group sizes, so as to gain as high as possible performances on GPUs during the backward aggregations. The experiment results by conducting GCN training in typical real-world graphs show that compared with GNNAdvisor, our approach improves the performance of backward aggregations by up to 5.68x on NVIDIA P100 GPU, and up to 6.57x on NVIDIA V100S GPU
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?