OpenACC Vs the Native Programming on Sunway TaihuLight: A Case Study with GTC-P
Linjin Cai,Yi-Chao Wang,William Tang,Bei Wang,Stephane Ethier,Zhao Liu,James Lin
DOI: https://doi.org/10.1109/cluster.2018.00021
2018-01-01
Abstract:Sunway TaihuLight is China's recent top-ranked supercomputer worldwide that was the first to be built entirely with home-grown processors. This supercomputer can be programmed with two approaches: directive-based OpenACC and native programming. These approaches are studied here using GTC-P, a particle-in-cell code for investigating micro-turbulence in magnetic fusion plasmas. We have compared the performance and programming efforts between the OpenACC and the native version of GTC-P. Associated results show that in the OpenACC version, the kernel with irregular memory access becomes the main performance bottleneck due to poor data locality. To address this issue, we have applied two optimizations on the native version: (1) register level communication (RLC); and (2) an "asynchronization" strategy. With these two optimizations, the native version can achieve up to 2.5X speedup for the memory-bound kernel compared with the OpenACC version. In addition, we have now scaled GTC-P on 4,259,840 cores of TaihuLight and demonstrate performance comparisons with several world-leading supercomputers.