HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

Yi-Chien Lin,Bingyi Zhang,Viktor Prasanna

2023-03-03

Abstract:As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU-Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21x bandwidth efficiency, and up to 4.26x speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.

Distributed, Parallel, and Cluster Computing,Hardware Architecture

What problem does this paper attempt to address?

The problem this paper attempts to address is: As the scale of real-world graph data increases, training Graph Neural Networks (GNNs) becomes very time-consuming and requires acceleration. Although previous research has demonstrated the potential of using Field Programmable Gate Arrays (FPGAs) to accelerate GNN training, few studies have focused on acceleration on multi-FPGA platforms due to the need for hardware expertise and extensive development work. To overcome these challenges, the authors propose the HitGNN framework, which can automatically map synchronous GNN training algorithms and GNN models to a CPU+multi-FPGA heterogeneous platform for efficient acceleration. Specifically, the HitGNN framework addresses the following issues: 1. **High hardware expertise requirement**: Developing efficient GNN accelerators requires deep hardware knowledge, and HitGNN lowers this barrier by providing high-level Application Programming Interfaces (APIs) and automatically generating hardware designs. 2. **Load balancing on multi-FPGA platforms**: On multi-FPGA platforms, the load between different FPGAs may be unbalanced, leading to poor performance. HitGNN balances the load of each FPGA through a two-stage task scheduler. 3. **High data communication overhead**: On multi-FPGA platforms, the data communication overhead between FPGAs is high. HitGNN reduces the communication overhead between FPGAs through optimized design. 4. **Low development efficiency**: Manually developing and optimizing GNN kernels requires a lot of time and effort. HitGNN provides a parameterized GNN kernel library and automatically determines the best accelerator configuration through a Design Space Exploration (DSE) engine. Through these optimizations, HitGNN not only achieves efficient GNN training acceleration but also significantly reduces the development difficulty, allowing users to easily deploy existing synchronous GNN training algorithms and models to CPU+multi-FPGA platforms. Compared to the latest frameworks on existing multi-GPU platforms, HitGNN shows significant improvements in bandwidth efficiency and speed while using fewer computational resources and memory bandwidth.

HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform

Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform

FP-GNN: Adaptive FPGA Accelerator for Graph Neural Networks

GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis

HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture

A Unified CPU-GPU Protocol for GNN Training

HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with Quantization

BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices

A Gather Accelerator for GNNs on FPGA Platform.

GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching

CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU

DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation

GraphAGILE: An FPGA-based Overlay Accelerator for Low-latency GNN Inference

Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks

L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous Platform

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks

GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling