Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Mucong Ding,Tahseen Rabbani,Bang An,Evan Z Wang,Furong Huang

2024-06-22

Abstract:Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the issues of memory and computational efficiency faced by Graph Neural Networks (GNNs) when training on large-scale graph data. Specifically, as the graph size grows, full-graph training becomes infeasible, and existing sampling methods or techniques based on historical embeddings, while alleviating memory bottlenecks, significantly increase computational time complexity. The paper proposes a new framework called Sketch-GNN, which achieves sublinear growth in training time and memory complexity. This means that as the size of the graph increases, the required time and memory do not increase proportionally. This is accomplished by creating compact "sketch" representations of the graph adjacency matrix and node feature matrix before training. These sketches approximate the original matrices at a lower dimension, significantly reducing memory requirements and computational burden. The core contributions of Sketch-GNN include: 1. **Nonlinear Activation Sketch Representation**: Utilizing Polynomial Tensor Sketch (PTS) theory to retain prediction accuracy without reverting to high-dimensional space. 2. **Learnable Locality-Sensitive Hashing (LSH)**: Proposing a method to update sketch quality online by learning and updating locality-sensitive hash functions, adaptively improving sketch quality and reducing performance loss. 3. **Sublinear Complexity**: Experiments demonstrate that Sketch-GNN maintains high prediction accuracy while its training complexity grows sublinearly with the graph size. Compared to existing methods, Sketch-GNN shows better scalability and efficiency across different types of GNNs (such as GCN and GraphSAGE), especially when handling large graph datasets. Additionally, this method avoids the long preprocessing times common in some graph compression-based methods. In summary, the paper addresses the efficient training of GNNs on large-scale graphs and proposes an innovative solution that significantly reduces computational resource requirements while ensuring prediction accuracy.

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

ByteGNN: Efficient Graph Neural Network Training at Large Scale

Scalable Graph Neural Networks Via Bidirectional Propagation.

GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks

Scalable and Efficient Full-Graph GNN Training for Large Graphs

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication

Graph Batch Coarsening Framework for Scalable Graph Neural Networks

Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

Feature-Oriented Sampling for Fast and Scalable GNN Training.

Accurate, Efficient and Scalable Graph Embedding

Blocking-based Neighbor Sampling for Large-scale Graph Neural Networks.

SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs.

Scaling Up Graph Neural Networks Via Graph Coarsening

TinyGNN: Learning Efficient Graph Neural Networks

Training Graph Neural Networks on Growing Stochastic Graphs

Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks

GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets

Training Large-Scale Graph Neural Networks Via Graph Partial Pooling

Efficient scaling of dynamic graph neural networks

Graph Neural Networks Inspired by Classical Iterative Algorithms