An Auto Code Generator for Stencil on SW26010

Xiaomin Zhu,Yunhui Zeng,Yanjie Wei,Shengzhong Feng,Weiguo Liu,Pavan Balaji
DOI: https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00040
2019-01-01
Abstract:Stencil is a basic building block widely used in many HPC areas and applications. It generally dominates the time cost and is critical to the overall performance. Given that heterogeneous many-core is frequently adopted to build super computers, porting and optimizing stencil codes on modern accelerator-based architectures is important. Porting is non-trivial, and optimization is more difficult as it requires better understanding of the underlying architecture. As a result, auto tuning targeting on accelerators such as GPU becomes a hot research topic. In this paper, we focus on tuning stencil automatically on SW26010, which is a heterogeneous many-core CPU equipped in Sunway TaihuLight SuperComputer, and mainly work on 2D stencil variants code in ROMS (Regional Ocean Modeling System). The code generator can generate codes running on slave cores for 50+ different FORTRAN stencil-alike loops in ROMS in seconds, and optimization methods are integrated into the generator as well. The performance is better than the directive based method (OpenACC) and the hand-drafted code by a junior programmer, and it is comparable compared with the code written by a senior programmer with several years of programming experience.
What problem does this paper attempt to address?