GNNLab: a Factored System for Sample-Based GNN Training over GPUs

Jianbang Yang,Dahai Tang,Xiaoniu Song,Lei Wang,Qiang Yin,Rong Chen,Wenyuan Yu,Jingren Zhou
DOI: https://doi.org/10.1145/3492321.3519557
2022-01-01
Abstract:We propose GNNLab, a sample-based GNN training system in a single machine multi-GPU setup. GNNLab adopts a factored design for multiple GPUs, where each GPU is dedicated to the task of graph sampling or model training. It accelerates both tasks by eliminating GPU memory contention. To balance GPU workloads, GNNLab applies a global queue to bridge GPUs asynchronously and adopts a simple yet effective method to adaptively allocate GPUs for different tasks. GNNLab further leverages temporarily switching to avoid idle waiting on GPUs. Furthermore, GNNLab proposes a new pre-sampling based caching policy that takes both sampling algorithms and GNN datasets into account, and shows an efficient and robust caching performance. Evaluations on three representative GNN models and four real-life graphs show that GNNLab outperforms the state-of-the-art GNN systems DGL and PyG by up to 9.1× (from 2.4×) and 74.3× (from 10.2×), respectively. In addition, our pre-sampling based caching policy achieves 90% -- 99% of the optimal cache hit rate in all experiments.
What problem does this paper attempt to address?