Porting and Optimizing GTC-P on TaihuLight Supercomputer with OpenACC

Wang Yichao,Lin Xinhua,Cai Linjin,Tang William,Ethier Stephane,Wang Bei,See Simon,Satoshi Matsuoka
DOI: https://doi.org/10.7544/issn1000-1239.2018.20160871
2018-01-01
Journal of Computer Research and Development
Abstract:Sunway TaihuLight with its sustainable performance achieving 93PFLOPS is now the No.1 supercomputer in the latest Top500 list.It provides a high-level directive language called OpenACC that is compatible with OpenACC 2.0 standard with some customized extensions.GTC-P is a discovery-science-capable real-world application code based on the particle-in-cell(PIC)algorithm that is well-established in the HPC area. Our motivation is to port GTC-P code on TaihuLight supercomputer with OpenACC. Since the Sunway OpenACC compiler cannot deal with the performance bottleneck of GTC-P at present when it is directly ported onto TaihuLight,we have applied three optimizations on an"intermediate"version of the code generated by the compiler:1)elimination of atomic operations;2)avoidance of expensive global memory access instructions;3)addition of SIMD intrinsics manually.The results from our numerical experiments show that these optimizations produce 1.6X and 8.6X speed-up on 64 CPE cores compared with a 1 MPE core for the key charge and push kernel PIC operations respectively.Overall,this accelerator makes the entire GTC-P code faster by a factor of 2.5X.Our findings demonstrate that manual optimizations on the"intermediate"code are important for achieving significant improved performance of PIC applications on TaihuLight with OpenACC.
What problem does this paper attempt to address?