GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

Zhongshu Zhu,Bin Jing,Xiaopei Wan,Zhizhen Liu,Lei Liang,Jun zhou

2024-01-06

Abstract:As a powerful tool for modeling graph data, Graph Neural Networks (GNNs) have received increasing attention in both academia and industry. Nevertheless, it is notoriously difficult to deploy GNNs on industrial scale graphs, due to their huge data size and complex topological structures. In this paper, we propose GLISP, a sampling based GNN learning system for industrial scale graphs. By exploiting the inherent structural properties of graphs, such as power law distribution and data locality, GLISP addresses the scalability and performance issues that arise at different stages of the graph learning process. GLISP consists of three core components: graph partitioner, graph sampling service and graph inference engine. The graph partitioner adopts the proposed vertex-cut graph partitioning algorithm AdaDNE to produce balanced partitioning for power law graphs, which is essential for sampling based GNN systems. The graph sampling service employs a load balancing design that allows the one hop sampling request of high degree vertices to be handled by multiple servers. In conjunction with the memory efficient data structure, the efficiency and scalability are effectively improved. The graph inference engine splits the $K$-layer GNN into $K$ slices and caches the vertex embeddings produced by each slice in the data locality aware hybrid caching system for reuse, thus completely eliminating redundant computation caused by the data dependency of graph. Extensive experiments show that GLISP achieves up to $6.53\times$ and $70.77\times$ speedups over existing GNN systems for training and inference tasks, respectively, and can scale to the graph with over 10 billion vertices and 40 billion edges with limited resources.

Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the scalability and performance issues encountered when deploying Graph Neural Networks (GNNs) on large-scale industrial graph data. Specifically, due to the vast size and complex topology of real-world graph data, traditional GNN methods face significant challenges when processing these large-scale graphs, including memory limitations, limited computational resources, and uneven data distribution. These issues make it extremely difficult to train and infer GNNs on industrial-scale graphs. To tackle these challenges, the paper proposes the GLISP system, which leverages the inherent structural properties of graphs (such as power-law distribution and data locality) to improve the scalability and performance of GNNs on large-scale graphs. The GLISP system consists of three core components: the graph partitioner, the graph sampling service, and the graph inference engine. Each component is optimized to address the aforementioned issues, aiming to enhance the overall efficiency and scalability of the system, thereby enabling effective operation on large-scale graphs containing billions of nodes and edges.

GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

AGL: a Scalable System for Industrial-purpose Graph Machine Learning

Scalable and Efficient Full-Graph GNN Training for Large Graphs

FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

ByteGNN: Efficient Graph Neural Network Training at Large Scale

InferTurbo: A Scalable System for Boosting Full-graph Inference of Graph Neural Network over Huge Graphs

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching

CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks

SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs.

GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs.

Scalable Graph Neural Networks Via Bidirectional Propagation.

PaGraph

Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

Efficient Graph Neural Network Inference at Large Scale

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

PCGraph: Accelerating GNN Inference on Large Graphs via Partition Caching

FlexGraph: a flexible and efficient distributed framework for GNN training

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication