Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer.

Yulong Ao,Chao Yang,Fangfang Liu,Wanwang Yin,Lijuan Jiang,Qiao Sun
DOI: https://doi.org/10.1145/3182177
IF: 1.444
2018-01-01
ACM Transactions on Architecture and Code Optimization
Abstract:In this article, we present some key techniques for optimizing HPCG on Sunway TaihuLight and demonstrate how to achieve high performance in memory-bound applications by exploiting specific characteristics of the hardware architecture. In particular, we utilize a block multicoloring approach for parallelization and propose methods such as requirement-based data mapping and customized gather collective to enhance the effective memory bandwidth. Experiments indicate that the optimized HPCG code can sustain 77% of the theoretical memory bandwidth and scale to the full system of more than 10 million cores, with an aggregated performance of 480.8 Tflop/s and a weak scaling efficiency of 87.3%.
What problem does this paper attempt to address?