ScalaGraph: A Scalable Accelerator for Massively Parallel Graph Processing

Pengcheng Yao,Long Zheng,Yu Huang,Qinggang Wang,Chuangyi Gui,Zhen Zeng,Xiaofei Liao,Hai Jin,Jingling Xue
DOI: https://doi.org/10.1109/HPCA53966.2022.00023
2022-01-01
Abstract:Graph processing is promising to extract valuable insights in graphs. Nowadays, emerging 3D-stacked memories and silicon technologies can provide over terabytes per second memory bandwidth and thousands of processing elements (PEs) to meet the high hardware demand of graph applications. However, this leap in hardware capability does not result in a huge increase but even a degradation sometimes in performance for graph processing. In this paper, we discover that the centralized on-chip memory hierarchy adopted in existing graph accelerators is the villain causing poor scalability due to its quadratic increase of hardware overheads with respect to the number of PEs.We present a novel distributed on-chip memory hierarchy by leveraging the network-on-chip (NoC) to enable massively parallel graph processing. We architect ScalaGraph, a brand new graph processing accelerator, to exploit this insight. ScalaGraph adopts a software-hardware co-design to minimize NoC communication overheads via an efficient row-oriented dataflow mapping and runtime aggregation. A specialized scheduling mechanism is also proposed to improve load imbalance. Our results on a Xilinx Alveo U280 FPGA card show that ScalaGraph on a modest configuration of 512 PEs achieves 2.2× and 3.2× speedups over a state-of-theart graph accelerator GraphDyns and a GPU-based graph system Gunrock, respectively. Moreover, ScalaGraph enables supporting at least 1,024 PEs with nearly linear performance scaling while GraphDyns fails to work.
What problem does this paper attempt to address?