Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling

Shizhen Xu,Yuanchao Xu,Wei Xue,Xipeng Shen,Fang Zheng,Xiaomeng Huang,Guangwen Yang
DOI: https://doi.org/10.1109/IPDPS.2018.00086
2018-01-01
Abstract:This paper presents an effort for overcoming the complexities of program optimizations on SW26010, the heterogeneous many-core processor that powers Sunway TaihuLight, the world top one supercomputer. The solution centers around a precise, static performance model for modern many-core processor. Through a careful design that leverages the special properties of SW26010 and an effective treatment to massive parallelism, the model achieves a high accuracy, showing less than 5% average errors in estimating program execution performance. The precise performance model opens many opportunities for analyzing and guiding code optimizations. The paper demonstrates the usefulness by revealing a series of insights on the effects of some important code optimizations on SW26010. Moreover, it demonstrates that with such a precise performance model, it is feasible to replace empirical auto-tuning with static auto-tuning for optimizing regular loops on heterogeneous many-core systems. Such a replacement speeds up the tuning process by as much as a factor of 43 while keeping the tuning quality loss below 6%.
What problem does this paper attempt to address?