Scaling Graph 500 SSSP to 140 Trillion Edges with over 40 Million Cores

Yuanwei Wang,Huanqi Cao,Zixuan Ma,Wanwang Yin,Wenguang Chen
DOI: https://doi.org/10.1109/sc41404.2022.00024
2022-01-01
Abstract:The SSSP kernel was first introduced into the Graph 500 benchmark in 2017. However, there has been no result from a full-scale world-top supercomputer. The primary reason is the poor work-inefficiency of existing algorithms at large scales. In this paper, we propose an SSSP implementation for The Newest Generation Sunway Supercomputer,including an SSSP algorithm to achieve work-efficiency, along with an adaptive dense/sparse-mode selection approach to achieve communication-efficiency. Our implementation reaches 7638 GTEPS, with 103158 processors (over 40 million cores), and achieves 3.7× in performance and 512× in graph size compared with the current top one on the Graph 500 SSSP list. Based on our experience of running extreme-scale SSSP, we uncover the root cause of its poor scalability: the weight distribution allows edges with weights close to zero, making the SSSP tree deeper on larger graphs. We further explore a scalability-friendly weight distribution by setting a non-zero lower bound to the edge weights.
What problem does this paper attempt to address?