Highly Efficient Breadth-First Search on CPU-Based Single-Node System

Chenglong Zhang,Huawei Cao,Xiaochun Ye,Guobo Wang,Qinfen Hao,Dongrui Fan
DOI: https://doi.org/10.1109/hpcc/smartcity/dss.2019.00286
2019-08-01
Abstract:With the rapid development of computer technology, the scale of graph increases explosively and large-scale graph computing has been the focus in recent years. Breadth-First Search (BFS) is one of the most important kernels in graph computing. It is the main kernel of the Graph500 benchmark that evaluates the performance of supercomputers and servers in terms of data-intensive applications. In this paper, we implement a highly efficient parallel BFS on CPU-based single-node systems. We propose two algorithm optimization techniques: round-robin vertex shuffle and jumping access. The former can improve the load balance for parallel implementation, while the latter can reduce unnecessary data accesses in the bottom-up algorithm. Our new implementation demonstrates strong scalability and achieves 65.2 GTEPS for Kronecker graph with $2^{\wedge}30$ vertices and $2^{\wedge}34$ edges on a 4-way Xeon server. The speedup is nearly 12.8 times faster than the original direction-optimizing algorithm. In terms of energy efficiency, the result on our self-designed ARM server is 254.6 MTEPS/W and is superior to conventional x86 server. To our best knowledge, this is the first work that evaluates BFS performance on ARM server and ARM server is suitable for data-intensive applications such as large-scale graph processing.
What problem does this paper attempt to address?