Abstract:We study training of Graph Neural Networks (GNNs) for large-scale graphs. We revisit the premise of using distributed training for billion-scale graphs and show that for graphs that fit in main memory or the SSD of a single machine, out-of-core pipelined training with a single GPU can outperform state-of-the-art (SoTA) multi-GPU solutions. We introduce MariusGNN, the first system that utilizes the entire storage hierarchy -- including disk -- for GNN training. MariusGNN introduces a series of data organization and algorithmic contributions that 1) minimize the end-to-end time required for training and 2) ensure that models learned with disk-based training exhibit accuracy similar to those fully trained in memory. We evaluate MariusGNN against SoTA systems for learning GNN models and find that single-GPU training in MariusGNN achieves the same level of accuracy up to 8x faster than multi-GPU training in these systems, thus, introducing an order of magnitude monetary cost reduction. MariusGNN is open-sourced at <a class="link-external link-http" href="http://www.marius-project.org" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to conduct resource - efficient graph neural network (GNN) training on large - scale graph data. Specifically, the paper explores a GNN training method that is more efficient and less costly than existing distributed multi - GPU systems in a single - machine environment by leveraging the entire storage hierarchy (including disks). ### Problem Background With the wide application of graph neural networks (GNNs) in processing large - scale graph data (such as social networks, knowledge graphs, etc.), GNN training faces two major challenges: 1. **Large - scale graph data**: Graphs in production environments usually contain millions of nodes and billions of edges, as well as feature vectors related to these nodes and edges, resulting in a huge amount of required storage space. 2. **High computational complexity of GNN**: Each layer of GNN depends on the multi - hop neighbor information of nodes, which makes the data flow graph expand exponentially as the number of layers increases. These problems make existing GNN training methods, especially distributed training methods, face problems of insufficient hardware resources and low training efficiency. ### Core Problems of the Paper The paper proposes the following core problems: - **When is distributed GNN training required?** Especially when graph data can be stored in the single - machine memory or SSD, is a complex distributed system really necessary? - **How to achieve efficient GNN training in a single - machine environment?** By using disks and optimizing sampling algorithms, can training be achieved faster and at a lower cost than multi - GPU distributed systems? ### Solutions To solve the above problems, the paper introduces the MariusGNN system, which has the following characteristics: - **Leveraging the entire storage hierarchy**: MariusGNN not only uses memory but also makes full use of disk storage, thereby supporting the training of larger - scale graph data. - **Optimizing sampling algorithms**: A new data structure DENSE is introduced. Through Delta encoding, redundant calculations and data transmissions are reduced, and the sampling speed is greatly increased. - **Innovative partition replacement strategy**: The COMET strategy is proposed to minimize disk I/O while ensuring model accuracy, especially in link prediction tasks. ### Experimental Results The experimental results show that the single - GPU training speed of MariusGNN on multiple datasets is 8 times faster than the eight - GPU deployment of existing systems, and the cost is reduced by 48 times. For example, on the WikiKG90Mv2 dataset, MariusGNN can complete training in only 8 hours and at a cost of 36 US dollars, while existing systems require 6 days and 1,720 US dollars. In summary, this paper aims to provide a resource - efficient and low - cost large - scale GNN training solution through the MariusGNN system, thereby promoting the further development of graph neural networks in practical applications.

MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

ByteGNN: Efficient Graph Neural Network Training at Large Scale

CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks

DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

Helios: An Efficient Out-of-core GNN Training System on Terabyte-scale Graphs with In-memory Performance

Reducing Memory Contention and I/O Congestion for Disk-based GNN Training

Graph Neural Network Training with Data Tiering

Scalable and Efficient Full-Graph GNN Training for Large Graphs

BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing

HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Efficient scaling of dynamic graph neural networks

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking

Distributed Matrix-Based Sampling for Graph Neural Network Training