FlexGraph: a flexible and efficient distributed framework for GNN training

Lei Wang,Qiang Yin,Chao Tian,Jianbang Yang,Rong Chen,Wenyuan Yu,Zihang Yao,Jingren Zhou
DOI: https://doi.org/10.1145/3447786.3456229
2021-04-21
Abstract:Graph neural networks (GNNs) aim to learn a low-dimensional feature for each vertex in the graph from its input high-dimensional feature, by aggregating the features of the vertex's neighbors iteratively. This paper presents Flex-Graph, a distributed framework for training GNN models. FlexGraph is able to efficiently train GNN models with flexible definitions of neighborhood and hierarchical aggregation schemes, which are the two main characteristics associated with GNNs. In contrast, existing GNN frameworks are usually designed for GNNs having fixed definitions and aggregation schemes. They cannot support different kinds of GNN models well simultaneously. Underlying FlexGraph are a simple GNN programming abstraction called NAU and a compact data structure for modeling various aggregation operations. To achieve better performance, FlexGraph is equipped with a hybrid execution strategy to select proper and efficient operations according to different contexts during aggregating neighborhood features, an application-driven workload balancing strategy to balance GNN training workload and reduce synchronization overhead, and a pipeline processing strategy to overlap computations and communications. Using real-life datasets and GNN models GCN, PinSage and MAGNN, we verify that NAU makes FlexGraph more expressive than prior frameworks (e.g., DGL and Euler) which adopt GAS-like programming abstractions, e.g., it can handle MAGNN that is beyond the reach of DGL and Euler. The evaluation further shows that FlexGraph outperforms the state-of-the-art GNN frameworks such as DGL and Euler in training time by on average 8.5× on GCN and PinSage.
Computer Science
What problem does this paper attempt to address?