Abstract:Present distributed graph training frameworks evenly partition a large graph into small chunks to suit for distributed storage, leverage an independent interface to access neighbors, and train Graph Neural Networks (GNNs) in a cluster of machines to update weights. Nevertheless, they consider a separate design of storage and training, taking huge communication cost for retrieving neighborhoods. During the storage phase, traditional heuristic graph partitioning not only suffers from memory overhead because of loading full graph into the memory, but also neglects meaningful node attributes to assist node or edge assignment by grouping semantically related structures (e.g., user interests, item categories in the recommendation graph). What's more, in the weight update phrase, directly averaging synchronization is difficult to tackle with heterogeneous local models where each machine's data is load from different subgraphs, resulting in slow convergence. To solve these problems, we propose a novel distributed graph training approach, \textit{Attribute-driven Streaming Edge Partitioning with Reconciliations}, where the local model loads only subgraph stored on its own machine to make fewer communications. ASEPR firstly clusters nodes with similar attributes in the same partition to maintain semantic structure and keep multi-hop neighbor locality. Then streaming partitioning combined with attribute clustering is applied to subgraph assignment for alleviating memory overhead. After local GNNs training on distributed machines, we deploy cross-layer reconciliation strategies for heterogeneous local models to improve averaged global model by knowledge distillation and contrastive learning. Extensive experiments conducted on four large graph datasets in node classification and link prediction tasks, show that our model outperforms DistDGL with fewer resource requirements and almost 2x speedup of convergence.

An Experimental Comparison of Partitioning Strategies for Distributed Graph Neural Network Training

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Attribute-Driven Streaming Edge Partitioning with Reconciliations for Distributed Graph Neural Networks Training

Leiden-Fusion Partitioning Method for Effective Distributed Training of Graph Embeddings

Distributed Training of Large Graph Neural Networks with Variable Communication Rates

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

ByteGNN: Efficient Graph Neural Network Training at Large Scale

Distributed Graph Neural Network Training: A Survey

Graph neural networks meet with distributed graph partitioners and reconciliations

CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

Graph Neural Network Training with Data Tiering

BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs

Scalable and Efficient Full-Graph GNN Training for Large Graphs

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Scalable Neural Network Training over Distributed Graphs

Large Scale Training of Graph Neural Networks for Optimal Markov-Chain Partitioning Using the Kemeny Constant

Uplifting the Expressive Power of Graph Neural Networks through Graph Partitioning