A Locality-Aware Energy-Efficient Accelerator for Graph Mining Applications.

Pengcheng Yao,Long Zheng,Zhen Zeng,Yu Huang,Chuangyi Gui,Xiaofei Liao,Hai Jin,Jingling Xue
DOI: https://doi.org/10.1109/micro50266.2020.00077
2020-01-01
Abstract:Graph mining is becoming increasingly important due to the ever-increasing demands on analyzing complex structures in graphs. Existing graph accelerators typically hold most of the randomly-accessed data in an on-chip memory to avoid off-chip communications. However, graph mining exhibits substantial random accesses from not only vertex dimension but also edge dimension (with the latter being excessively more complex than the former), leading to significant degradations in terms of both performance and energy efficiency.We observe that the most random memory requests arising in graph mining come from accessing a small fraction of valuable (vertex and edge) data when handling real-world graphs. To exploit this extension locality with maximum parallelism, we architect GRAMER, the first graph mining accelerator. GRAMER contains a specialized memory hierarchy, where the valuable data (precisely identified through a cost-efficient heuristic) is permanently resident in a high-priority memory while others are maintained in a cache-like memory under a lightweight replacement policy. The specific pipelined processing units are carefully designed to maximize computational parallelism. GRAMER is also equipped with a work-stealing mechanism to reduce load imbalance. We have implemented GRAMER on a Xilinx Alveo U250 accelerator card. Compared with two state-of-the-art CPU-based graph mining systems, Fractal and RStream, running on a 14-core Intel E5-2680 v4 processor, GRAMER achieves not only considerable speedups (1.11 × ~ 129.95 ) but also significant energy savings (5.79 × ~ 678.34×)
What problem does this paper attempt to address?