Abstract:Present distributed graph training frameworks evenly partition a large graph into small chunks to suit for distributed storage, leverage an independent interface to access neighbors, and train Graph Neural Networks (GNNs) in a cluster of machines to update weights. Nevertheless, they consider a separate design of storage and training, taking huge communication cost for retrieving neighborhoods. During the storage phase, traditional heuristic graph partitioning not only suffers from memory overhead because of loading full graph into the memory, but also neglects meaningful node attributes to assist node or edge assignment by grouping semantically related structures (e.g., user interests, item categories in the recommendation graph). What's more, in the weight update phrase, directly averaging synchronization is difficult to tackle with heterogeneous local models where each machine's data is load from different subgraphs, resulting in slow convergence. To solve these problems, we propose a novel distributed graph training approach, \textit{Attribute-driven Streaming Edge Partitioning with Reconciliations}, where the local model loads only subgraph stored on its own machine to make fewer communications. ASEPR firstly clusters nodes with similar attributes in the same partition to maintain semantic structure and keep multi-hop neighbor locality. Then streaming partitioning combined with attribute clustering is applied to subgraph assignment for alleviating memory overhead. After local GNNs training on distributed machines, we deploy cross-layer reconciliation strategies for heterogeneous local models to improve averaged global model by knowledge distillation and contrastive learning. Extensive experiments conducted on four large graph datasets in node classification and link prediction tasks, show that our model outperforms DistDGL with fewer resource requirements and almost 2x speedup of convergence.

Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling

Scientific Computing Algorithms to Learn Enhanced Scalable Surrogates for Mesh Physics

X-MeshGraphNet: Scalable Multi-Scale Graph Neural Networks for Physics Simulation

Mesh-based Super-Resolution of Fluid Flows with Multiscale Graph Neural Networks

Multiscale graph neural networks with adaptive mesh refinement for accelerating mesh-based simulations

Graph Neural Networks for Mesh Generation and Adaptation in Structural and Fluid Mechanics

Mesh-based GNN surrogates for time-independent PDEs

Attribute-Driven Streaming Edge Partitioning with Reconciliations for Distributed Graph Neural Networks Training

Sampling-based Distributed Training with Message Passing Neural Network

Scalable Neural Network Training over Distributed Graphs

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

Graph neural networks meet with distributed graph partitioners and reconciliations

Scalable and Efficient Full-Graph GNN Training for Large Graphs

Efficient scaling of dynamic graph neural networks

On the Scalability of GNNs for Molecular Graphs

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Towards Efficient Large-Scale Graph Neural Network Computing.

ByteGNN: Efficient Graph Neural Network Training at Large Scale

A Graph Neural Network Approach to Dispersed Systems

GNN at the Edge: Cost-Efficient Graph Neural Network Processing over Distributed Edge Servers

HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture