Abstract:Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks. As a practical solution to train GNN on large graphs with billions of nodes and edges, the sampling-based training is widely adopted by existing training frameworks. However, through an in-depth analysis, we observe that the efficiency of existing sampling-based training frameworks is still limited due to the key bottlenecks lying in all three phases of sampling-based training, i.e., subgraph sample, memory IO, and computation. To this end, we propose FastGL, a GPU-efficient Framework for accelerating sampling-based training of GNN at Large scale by simultaneously optimizing all above three phases, taking into account both GPU characteristics and graph structure. Specifically, by exploiting the inherent overlap within graph structures, FastGL develops the Match-Reorder strategy to reduce the data traffic, which accelerates the memory IO without incurring any GPU memory overhead. Additionally, FastGL leverages a Memory-Aware computation method, harnessing the GPU memory's hierarchical nature to mitigate irregular data access during computation. FastGL further incorporates the Fused-Map approach aimed at diminishing the synchronization overhead during sampling. Extensive experiments demonstrate that FastGL can achieve an average speedup of 11.8x, 2.2x and 1.5x over the state-of-the-art frameworks PyG, DGL, and GNNLab, respectively.Our code is available at <a class="link-external link-https" href="https://github.com/a1bc2def6g/fastgl-ae" rel="external noopener nofollow">this https URL</a>.

Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs.

PaGraph

ByteGNN: Efficient Graph Neural Network Training at Large Scale

Scalable and Efficient Full-Graph GNN Training for Large Graphs

BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs

Auto-Divide GNN: Accelerating GNN Training with Subgraph Division.

SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs.

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

BGS: Accelerate GNN Training on Multiple GPUs

FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale

Accelerate Graph Neural Network Training by Reusing Batch Data on GPUs

GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Graph Neural Network Training with Data Tiering

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing

PCGraph: Accelerating GNN Inference on Large Graphs via Partition Caching

CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks