EC-Graph: A Distributed Graph Neural Network System with Error-Compensated Compression

Zhen Song,Yu Gu,Jianzhong Qi,Zhigang Wang,Ge Yu
DOI: https://doi.org/10.1109/ICDE53745.2022.00053
2022-01-01
Abstract:The high training costs of graph neural networks (GNNs) have limited their applicability on large graphs, e.g., graphs with hundreds of millions of vertices which have become common in the era of big data. A few recent studies propose distributed GNN systems. However, these systems may generate high communication costs due to the extensive message passing among graph vertices stored on different machines. To address such limitations, in the paper, 1) we propose a distributed GNN computation system named EC-Graph for CPU clusters, which drastically reduces the communication costs among the machines by message compression; 2) we design a requesting-end compensation method for the embeddings to mitigate the errors induced by compression in the forward propagation and a Bit-Tuner to adaptively balance the model accuracy and message size; and 3) we propose a responding-end compensation approach for the embedding gradients in the backward propagation. Extensive experiments over large real-world datasets show that EC-Graph outperforms state-of-the-art distributed GNN systems on two CPU clusters of different sizes.
What problem does this paper attempt to address?