Abstract:Computing single-source shortest paths (SSSP) is one of the fundamental problems in graph theory and is also essential for data-intensive applications. As the potential of artificial intelligence (AI) continues to be explored, and with the advent of exascale supercomputing, there is a growing need for an extremely fast graph engine for SSSP applications. Current distributed SSSP engines for large-scale graph applications, unfortunately, often exhibit poor efficiency when running on supercomputers. In this paper, we introduce TianheStar, an ultra-fast SSSP engine designed specifically for graph search on the Tianhe supercomputer. TianheStar effectively minimizes communication costs and establishes a new balance between computation and communication. The key idea of TianheStar is to leverage network topology information for performing topology-aware message aggregation and architecture-aware group communication. These two techniques effectively reduce the number of messages and the average number of communication hops, respectively. We validate TianheStar using Graph500, a widely adopted benchmark for graph search on supercomputers. Extensive evaluation demonstrates that, compared to the state-of-the-art solutions, TianheStar achieves a remarkable performance improvement. We have deployed TianheStar on the latest Tianhe supercomputer and secured the top position in the latest Graph500. We achieved an outstanding performance of 23,021 GTEPS (Giga Traversed Edges Per Second) for SSSP using 4096 nodes. Furthermore, we have delved into real-world graphs representing the USA road networks and conducted computations to determine the shortest paths between vertices. Our experimental results demonstrate that TianheStar can traverse the USA road network, comprising over 58,333,344 edges, in less than 0.1 second on the Tianhe supercomputer. This performance represents a speedup of over a thousand times compared to parallel shortest-path graph computations on the Aziz supercomputer, a globally renowned high-performance computing system, using the same input data.

FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing

TopoX: topology refactorization for efficient graph partitioning and processing

TopoX: Topology Refactorization for Minimizing Network Communication in Graph Computations

Topo: Towards a Fine-grained Topological Data Processing Framework on Tianhe-3 Supercomputer

XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers

TianheQueries: Ultra-Fast and Scalable Graph Queries on Tianhe Supercomputer

MST: Topology-Aware Message Aggregation for Exascale Graph Processing of Traversal-Centric Algorithms

TianheGraph: Customizing Graph Computation for Tianhe Exascale Supercomputing System

TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer

ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds

A Topology-Aware Framework for Graph Traversals.

TianheStar: Orchestrating SSSP Applications on Tianhe Supercomputer

Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores

A Two-Level Parallel Decomposition Approach for Transient Stability Constrained Optimal Power Flow

Optimizing Graph Partition by Optimal Vertex-Cut: A Holistic Approach.

A Topology-Adaptive Strategy for Graph Traversing

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores

GraphService: Topology-aware Constructor for Large-scale Graph Applications

GraphCube: Interconnection Hierarchy-aware Graph Processing.

A Method of Spatial Data Partition for Efficient Parallel Computing of Topological Relations

OHTMA: an Optimized Heuristic Topology-Aware Mapping Algorithm on Thetianhe-3 Exascale Supercomputer Prototype