RAHP: A Redundancy-aware Accelerator for High-performance Hypergraph Neural Network
Hui Yu,Yu Zhang,Ligang He,Yingqi Zhao,Xintao Li,Ruida Xin,Jin Zhao,Xiaofei Liao,Haikun Liu,Bingsheng He,Hai Jin
DOI: https://doi.org/10.1109/micro61859.2024.00094
2024-01-01
Abstract:Hypergraph Neural Network (HyperGNN) has emerged as a potent methodology for dissecting intricate multilateral connections among various entities. Current software/hardware solutions leverage a sequential execution model that relies on hyperedge and vertex indices for conducting standard matrix operations for HyperGNN inference. Yet, they are impeded by the dual challenges of redundant computation and irregular memory access overheads. This is primarily due to the frequent and repetitive access and updating of a number of feature vectors corresponding to the same hyperedges and vertices. To address these challenges, we propose the first redundancy-aware accelerator, RAHP, which enables high performance execution of HyperGNN inference. Specifically, we present a redundancy-aware asynchronous execution approach into the accelerator design for HyperGNN to reduce redundant computations and off-chip memory accesses. To unveil opportunities for data reuse and unlock the parallelism that existing HyperGNN solutions fail to capture, it prioritizes vertices with the highest degree as roots, prefetching other vertices along the hypergraph structure to capture the common vertices among multiple hyperedges, and synchronizing the computations of hyperedges and vertices in real-time. By such means, this facilitates the concurrent processing of relevant hyperedge and vertex computations of the common vertices along the hypergraph topology, resulting in smaller redundant computations overhead. Furthermore, by efficiently caching intermediate results of the common vertices, it curtails memory traffic and off-chip communications. To fully harness the performance potential of our proposed approach in the accelerator, RAHP incorporates a topology-driven data loading mechanism to minimize off-chip memory accesses on the fly. It is also endowed with an adaptive data synchronization scheme to mitigate the effects of conflicting updates of both hyperedges and vertices. Moreover, RAHP employs the similarity-based data caching strategy to further mitigate the overhead of redundant data transfers. We have implemented and assessed RAHP on a Xilinx Alveo U280 FPGA card. Experimental evaluations demonstrate that RAHP achieves average speedups of 439.2x and 64.7x for HyperGNN inference, alongside average energy savings of 542.8x and 84.2x, compared to the cutting-edge software-based HyperGNN implementations on Intel Xeon CPUs and NVIDIA A100 GPUs, respectively. Additionally, in the realm of HyperGNN inference, RAHP secures average speedups of 7.8x, 5.4x, and 3.8x, and average energy savings of 10.2x, 8.9x, and 6.5x over the foremost GNN accelerators, i.e., FlowGNN, LL-GNN, and ReGNN, respectively.