Attribute-Driven Streaming Edge Partitioning with Reconciliations for Distributed Graph Neural Networks Training

Zongshen Mu,Siliang Tang,Yueting Zhuang,Dianhai Yu
DOI: https://doi.org/10.2139/ssrn.4283544
2022-01-01
Abstract:Present distributed graph training frameworks evenly partition a large graph into small chunks to suit for distributed storage, leverage an independent interface to access neighbors, and train Graph Neural Networks (GNNs) in a cluster of machines to update weights. Nevertheless, they consider a separate design of storage and training, taking huge communication cost for retrieving neighborhoods. During the storage phase, traditional heuristic graph partitioning not only suffers from memory overhead because of loading full graph into the memory, but also neglects meaningful node attributes to assist node or edge assignment by grouping semantically related structures (e.g., user interests, item categories in the recommendation graph). What's more, in the weight update phrase, directly averaging synchronization is difficult to tackle with heterogeneous local models where each machine's data is load from different subgraphs, resulting in slow convergence. To solve these problems, we propose a novel distributed graph training approach, \textit{Attribute-driven Streaming Edge Partitioning with Reconciliations}, where the local model loads only subgraph stored on its own machine to make fewer communications. ASEPR firstly clusters nodes with similar attributes in the same partition to maintain semantic structure and keep multi-hop neighbor locality. Then streaming partitioning combined with attribute clustering is applied to subgraph assignment for alleviating memory overhead. After local GNNs training on distributed machines, we deploy cross-layer reconciliation strategies for heterogeneous local models to improve averaged global model by knowledge distillation and contrastive learning. Extensive experiments conducted on four large graph datasets in node classification and link prediction tasks, show that our model outperforms DistDGL with fewer resource requirements and almost 2x speedup of convergence.
What problem does this paper attempt to address?