5 ExaFlop/s HPL-MxP Benchmark with Linear Scalability on the 40-Million-Core Sunway Supercomputer.

Rongfen Lin,Xinhui Yuan,Wei Xue,Wanwang Yin,Jienan Yao,Junda Shi,Qiang Sun,Chaobo Song,Fei Wang
DOI: https://doi.org/10.1145/3581784.3607030
2023-01-01
Abstract:HPL-MxP is an emerging high performance benchmark used to measure the mixed-precision computing capability of leading supercomputers. In this work, we present our efforts on the new Sunway that linearly scales the benchmark to over 40 million cores, sustains an overall mixed-precision performance exceeding 5 ExaFlop/s, and achieves over 85% of peak performance, which is the highest efficiency reached among all heterogeneous systems on the HPL-MxP list. The optimizations of our HPL-MxP implementation include the following: (1) a Two-Direction Look-Ahead and Overlap algorithm that enables overlaps of all communications with computation; (2) a multi-level process-mapping and communication scheduling method that uses the entire network as best as possible while maintaining conflict-free algorithm-flow; and (3) a CG-Fusion computing framework that eliminates up to 60% of inter-chip communications and removes the memory access bottleneck while serving both computation and communication simultaneously. This work could also provide useful insights for tuning cutting-edge applications on Sunway supercomputers as well as other heterogeneous supercomputers.
What problem does this paper attempt to address?