Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Borui Wan,Juntao Zhao,Chuan Wu

2023-06-02

Abstract:Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the training graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T^{-1}) with T being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves distributed full-graph training's throughput (up to 3.01 X) with negligible accuracy drop (at most 0.30%) or even accuracy improvement (up to 0.19%) in most cases, showing significant advantages over the state-of-the-art works.

Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

The paper aims to address the issues of high bandwidth demand and long training times in distributed full-graph Graph Neural Network (GNN) training on large-scale graphs. Specifically, the paper focuses on the communication overhead caused by the frequent exchange of node features, embeddings, and their gradients (collectively referred to as messages) during training. For edge nodes (whose neighbors are on other devices), this frequent message exchange results in significant communication overhead; for central nodes (whose neighbors are on the local device), it leads to unnecessary waiting time. To solve the above problems, the authors propose an efficient GNN training system called AdaQP, which accelerates distributed full-graph GNN training through the following two main aspects: 1. **Adaptive Quantization of Messages**: Reducing the amount of communication data by stochastically quantizing the messages transmitted across devices. 2. **Parallelization of Computation and Communication**: Overlapping the computation of central nodes with the message transmission of edge nodes on each device to maximize training speed and resource utilization. The paper also provides theoretical analysis, proving that AdaQP ensures fast convergence and designs an adaptive quantization bit-width allocation scheme to achieve a good balance between training convergence and efficiency. Experimental results show that AdaQP significantly reduces communication time and improves training throughput, with almost no loss in accuracy and even some improvements.

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication

ByteGNN: Efficient Graph Neural Network Training at Large Scale

A Graph Neural Network Based Decentralized Learning Scheme

CDFGNN: a Systematic Design of Cache-based Distributed Full-Batch Graph Neural Network Training with Communication Reduction

Scalable and Efficient Full-Graph GNN Training for Large Graphs

Fully Distributed Online Training of Graph Neural Networks in Networked Systems

Distributed Training of Large Graph Neural Networks with Variable Communication Rates

QGABS: GPU Tensor Core-accelerated Quantized Graph Neural Network based on Adaptive Batch Size

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Distributed Graph Neural Network Training: A Survey

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

BiFeat: Supercharge GNN Training via Graph Feature Quantization

HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration

Design of Retransmission Mechanism for Decentralized Inference with Graph Neural Networks

Attribute-Driven Streaming Edge Partitioning with Reconciliations for Distributed Graph Neural Networks Training

GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy

BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs

Communication-Free Distributed GNN Training with Vertex Cut