Analysis and MPI Implementation of LQCD Dslash on Sunway TaihuLight*
Miao ZHANG,Yu ZHOU,Jianhai CHEN,Qinming HE,Shun XU,Ming GONG
2019-01-01
Abstract:Sunway TaihuLight is the supercomputer whose cores are more than ten million developed by China in its own independent way. Many large scale applications have been transplanted and optimized on it. However, the lattice quantum chromodynamics (LQCD) application of high energy physics has not been ported and optimized on the Sunway platform, which has attracted the attention of researchers. In this paper, the transplantation and optimization of LQCD on Sunway platform is studied. Firstly, the development at home and abroad of parallel optimization of LQCD in different hardware architectures is discussed. Secondly, through the reconstruction of its hot module—Dslash, it realizes the successful transplantation on Sunway platform. Thirdly, according to the architecture and parallel mode of the heterogeneous many-core SW26010 processor, the heterogeneous parallelism of the computing processing element (CPE) cluster, the direct memory access (DMA) communication between the CPE local device memory (LDM) and the main memory, the message passing interface (MPI) communication between the management processing elements (MPE), and the global reduction are realized. Finally, through the experiment, the optimized program of single core group (CG) version and the optimized program of 16 CGs version achieve 165 and 25 times speedups accordingly compared with single MPE version, and some important performance bottlenecks are found, which lays an important foundation for further optimization to improve the overall performance. At the same time, the work of this paper has positive significance for the popularization of the domestic supercom-puting platform.