Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code and Large-Scale Performance Test on TH-1A.

Xiangfei Meng,Xiaoqian Zhu,Peng Wang,Yang Zhao,Xin Liu,Bao Zhang,Yong Xiao,Wenlu Zhang,Zhihong Lin
DOI: https://doi.org/10.1007/978-3-642-38750-0_7
2013-01-01
Abstract:In this work, we discuss the porting to the GPU platform of the latest production version of the Gyrokinetic Torodial Code (GTC), which is a petascale fusion simulation code using particle-in-cell method. New GPU parallel algorithms have been designed for the particle push and shift operations. The GPU version of the GTC code was benchmarked on up to 3072 nodes of the Tianhe-1A supercomputer, which shows about 2x-3x overall speedup comparing NVIDIA M2050 GPUs to Intel Xeon X5670 CPUs. Strong and weak scaling studies have been performed using actual production simulation parameters, providing insights into GTC's scalability and bottlenecks on large GPU supercomputers.
What problem does this paper attempt to address?