ByteGNN: Efficient Graph Neural Network Training at Large Scale
Li,Yifan Wu,Zhezheng Song,Shuai Zhang,Che Zheng,Hongzhi Chen,James Cheng,Yuxuan Cheng,Han Yang,Changji
DOI: https://doi.org/10.14778/3514061.3514069
IF: 2.5
2022-02-01
Proceedings of the VLDB Endowment
Abstract:Graph neural networks (GNNs) have shown excellent performance in a wide range of applications such as recommendation, risk control, and drug discovery. With the increase in the volume of graph data, distributed GNN systems become essential to support efficient GNN training. However, existing distributed GNN training systems suffer from various performance issues including high network communication cost, low CPU utilization, and poor end-to-end performance. In this paper, we propose ByteGNN, which addresses the limitations in existing distributed GNN systems with three key designs: (1) an abstraction of mini-batch graph sampling to support high parallelism, (2) a two-level scheduling strategy to improve resource utilization and to reduce the end-to-end GNN training time, and (3) a graph partitioning algorithm tailored for GNN workloads. Our experiments show that ByteGNN outperforms the state-of-the-art distributed GNN systems with up to 3.5--23.8 times faster end-to-end execution, 2--6 times higher CPU utilization, and around half of the network communication cost.
Computer Science