Improving Performance of Graph Processing on FPGA-DRAM Platform by Two-level Vertex Caching

Zhiyuan Shao,Ruoshi Li,Diqing Hu,Xiaofei Liao,Hai Jin
DOI: https://doi.org/10.1145/3289602.3293900
2019-01-01
Abstract:In recent years, graph processing attracts lots of attention due to its broad applicability in solving real-world problems. With the flexibility and programmability, FPGA platforms provide the opportunity of processing the graph data with high efficiency. On FPGA-DRAM platforms, the state-of-art solution of graph processing (i.e., ForeGraph) attaches each pipeline with local vertex buffers to cache the source and destination vertices during processing. Such one-level vertex caching mechanism, however, results in excessive amounts of vertex data transmissions that consume the precious DRAM bandwidth, and frequent pipeline stalls that waste the processing power of the FPGA. In this paper, we propose a two-level vertex caching mechanism to improve the performance of graph processing on FPGA-DRAM platforms by reducing the amounts of vertex data transmissions and pipeline stalls during the execution of graph algorithms. We build a system, named as FabGraph, to implement such two-level vertex caching mechanism by using available on-chip storage resources, including BRAM and UltraRAM. Experimental results show that: FabGraph achieves up to 3.1x and 2.5x speedups over ForeGraph for BFS and PageRank respectively, on the FPGA board with relatively large BRAM; and up to 3.1x and 3.0x speedups over ForeGraph for BFS and PageRank respectively, on the FPGA board with small BRAM but large UltraRAM. Our experience in this paper suggests that the two-level vertex caching design is effective in improving the performance of graph processing on FPGA-DRAM platforms.
What problem does this paper attempt to address?