TianheStar: Orchestrating SSSP Applications on Tianhe Supercomputer
Xinbiao Gan,Qian Tang,Feng Xiong,Shijie Li,Bo Yang,Tiejun Li
DOI: https://doi.org/10.1109/ccgrid59990.2024.00066
2024-01-01
Abstract:Computing single-source shortest paths (SSSP) is one of the fundamental problems in graph theory and is also essential for data-intensive applications. As the potential of artificial intelligence (AI) continues to be explored, and with the advent of exascale supercomputing, there is a growing need for an extremely fast graph engine for SSSP applications. Current distributed SSSP engines for large-scale graph applications, unfortunately, often exhibit poor efficiency when running on supercomputers. In this paper, we introduce TianheStar, an ultra-fast SSSP engine designed specifically for graph search on the Tianhe supercomputer. TianheStar effectively minimizes communication costs and establishes a new balance between computation and communication. The key idea of TianheStar is to leverage network topology information for performing topology-aware message aggregation and architecture-aware group communication. These two techniques effectively reduce the number of messages and the average number of communication hops, respectively. We validate TianheStar using Graph500, a widely adopted benchmark for graph search on supercomputers. Extensive evaluation demonstrates that, compared to the state-of-the-art solutions, TianheStar achieves a remarkable performance improvement. We have deployed TianheStar on the latest Tianhe supercomputer and secured the top position in the latest Graph500. We achieved an outstanding performance of 23,021 GTEPS (Giga Traversed Edges Per Second) for SSSP using 4096 nodes. Furthermore, we have delved into real-world graphs representing the USA road networks and conducted computations to determine the shortest paths between vertices. Our experimental results demonstrate that TianheStar can traverse the USA road network, comprising over 58,333,344 edges, in less than 0.1 second on the Tianhe supercomputer. This performance represents a speedup of over a thousand times compared to parallel shortest-path graph computations on the Aziz supercomputer, a globally renowned high-performance computing system, using the same input data.