A Data-Centric Accelerator for High-Performance Hypergraph Processing

Qinggang Wang,Long Zheng,Ao Hu,Yu Huang,Pengcheng Yao,Chuangyi Gui,Xiaofei Liao,Hai Tin,Jingling Xue
DOI: https://doi.org/10.1109/micro56248.2022.00088
2022-01-01
Abstract:Hypergraph processing has emerged as a powerful approach for analyzing complex multilateral relationships among multiple entities. Past research on building hypergraph systems suggests that changing the scheduling order of bipartite edge tasks can improve the overlap-induced data locality in hypergraph processing. However, due to the complex intertwined connections between vertices and hyperedges, it is almost impossible to find a locality-optimal scheduling order. Thus, these task-centric hypergraph systems often suffer from substantial off-chip communications. In this paper, we first propose a novel data-centric Load-Trigger-Reduce (LTR) execution model to exploit fully the locality in hypergraph processing. Unlike a task-centric model that loads the required data along with a task, our LTR model invokes tasks as per the data used. Specifically, once the hypergraph data is loaded into the on-chip memory, all of its relevant computation tasks will be triggered simultaneously to output intermediate results, which are finally reduced to update the final results. Our LTR model enables all hypergraph data to be accessed once in each iteration. To fully exploit the LTR performance potential, we further architect an LTR-driven hypergraph accelerator, XuLin, which features with an adaptive data loading mechanism to minimize the loading cost via chunk merging at runtime. XuLin is also equipped with a priority-based differential data reduction scheme to reduce the impact of conflicting updates on performance. We have implemented XuLin both on a Xilinx Alveo U250 FPGA card and using a cycle-accurate simulator. The results show that XuLin outperforms the state-of-the-art hypergraph processing solutions Hygra and ChGraph by $20.47 \times$ and $8.77 \times$ on average, respectively.
What problem does this paper attempt to address?