Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway Taihulight Supercomputer
Haohuan Fu,Liao Junfeng,Wei Xue,Lanning Wang,Dexun Chen,Ling Gu,Jianping Xu,Nan Ding,Xinliang Wang,Chuan He,Shuxin Xu,Yishuang Liang,Jiayuan Fang,Ye Xu,Wei Zheng,Jia Xu,Zhuozhao Zheng,Wanjing Wei,He Zhang,Bingwei Chen,Kaiwei Li,Xiaomeng Huang,Wenguang Chen,Guangwen Yang
DOI: https://doi.org/10.5555/3014904.3015016
2016-01-01
Abstract:This paper reports our efforts on refactoring and optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight supercomputer, which uses a many-core processor that consists of management processing elements (MPEs) and clusters of computing processing elements (CPEs). To map the large code base of CAM to the millions of cores on the Sunway system, we take OpenACC-based refactoring as the major approach, and apply source-to-source translator tools to exploit the most suitable parallelism for the CPE cluster, and to fit the intermediate variable into the limited on-chip fast buffer. For individual kernels, when comparing the original ported version using only MPEs and the refactored version using both the MPE and CPE clusters, we achieve up to 22× speedup for the compute-intensive kernels. For the 25km resolution CAM global model, we manage to scale to 24,000 MPEs, and 1,536,000 CPEs, and achieve a simulation speed of 2.81 model years per day.