HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform

Yi-Chien Lin,Bingyi Zhang,Viktor Prasanna

DOI: https://doi.org/10.1145/3490422.3502359

2021-12-22

Abstract:Graph Neural Networks (GNNs) have shown great success in many applications such as recommendation systems, molecular property prediction, traffic prediction, etc. Recently, CPU-FPGA heterogeneous platforms have been used to accelerate many applications by exploiting customizable data path and abundant user-controllable on-chip memory resources of FPGAs. Yet, accelerating and deploying GNN training on such platforms requires not only expertise in hardware design but also substantial development efforts. We propose HP-GNN, a novel framework that generates high throughput GNN training implementations on a given CPU-FPGA platform that can benefit both application developers and machine learning researchers. HP-GNN takes GNN training algorithms, GNN models as the inputs, and automatically performs hardware mapping onto the target CPU-FPGA platform. HP-GNN consists of: (1) data layout and internal representation that reduce the memory traffic and random memory accesses; (2) optimized hardware templates that support various GNN models; (3) a design space exploration engine for automatic hardware mapping; (4) high-level application programming interfaces (APIs) that allows users to specify GNN training with only a handful of lines of code. To evaluate HP-GNN, we experiment with two well-known sampling-based GNN training algorithms and two GNN models. For each training algorithm and model, HP-GNN generates implementation on a state-of-the-art CPU-FPGA platform. Compared with CPU-only and CPU-GPU platforms, experimental results show that the generated implementations achieve $55.67\times$ and $2.17\times$ speedup on the average, respectively. Compared with the state-of-the-art GNN training implementations, HP-GNN achieves up to $4.45\times$ speedup.

Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to efficiently achieve high - throughput generation for graph neural network (GNN) training on the CPU - FPGA heterogeneous platform. Specifically, the paper proposes a new framework named HP - GNN, aiming to automatically map GNN training algorithms and models to the target CPU - FPGA platform, thereby improving the efficiency and performance of GNN training. The background of this problem is that although GNN has achieved remarkable success in fields such as recommendation systems, molecular property prediction, and traffic prediction, GNN training on large - scale graph data still faces the challenges of memory bandwidth limitations and high computational complexity. In addition, existing acceleration methods usually require specialized hardware design knowledge, which limits their wide application. HP - GNN solves these problems in the following ways: 1. **Data Layout and Internal Representation**: Optimizes data storage and access patterns, reducing memory traffic and random memory access. 2. **Optimized Hardware Templates**: Supports multiple GNN models and provides flexible hardware acceleration schemes. 3. **Design Space Exploration Engine**: Automates hardware configuration optimization to adapt to different GNN training algorithms and parameters. 4. **Advanced Application Programming Interfaces (APIs)**: Provides an easy - to - use software interface, enabling users to quickly develop GNN training programs without in - depth knowledge of hardware details. Through these innovations, HP - GNN not only improves the throughput of GNN training but also lowers the development threshold, allowing more application developers and machine - learning researchers to use the CPU - FPGA platform for efficient GNN training. Experimental results show that compared with using only the CPU or the CPU - GPU platform, HP - GNN improves the average speed by 55.67 times and 2.17 times respectively, and compared with the state - of - the - art GNN training implementation, the speed is increased by 4.45 times.

HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform

HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform

GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis

HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture

FP-GNN: Adaptive FPGA Accelerator for Graph Neural Networks

A Unified CPU-GPU Protocol for GNN Training

BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation

CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms

HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with Quantization

Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks

A Gather Accelerator for GNNs on FPGA Platform.

GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling

L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous Platform

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching

DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference