Performance Modeling of Stencil Computation on SW26010 Processors

Yao Liu,Li Liu,Mengtao Hu,Wei Wang,Wei Xue,Qingting Zhu
DOI: https://doi.org/10.1007/978-3-030-60245-1_27
2020-01-01
Abstract:Stencil computation is a basic part in a large variety of scientific computing programs, especially for those containing partial differential equations. Due to the limited memory bandwidth, it is a challenge to improve the parallel efficiency of stencil computation on modern supercomputers. Performance modeling is the most common method of performance analysis. In this paper, we propose the generic performance model based on Sunway TaihuLight which is powered by SW26010 heterogeneous many-core processors. The generic model indicates the interaction between the programs and the computing platform from the architecture perspective, and points out the performance bottlenecks of the programs from the optimization perspective. Furthermore, we propose the specific performance model of stencil computation on SW26010 processors, and optimize the performance of stencil computation under the guidance of the model. The experimental results show that the performance models proposed in this paper are effective—the average error ratio of the predicted performance is less than 7%. Guided by the specific model, the optimized stencil computation achieves better performance than the unoptimized many-core version by 154.71% on 4096 cores.
What problem does this paper attempt to address?