A GPU-enabled acceleration algorithm for the CAM5 cloud microphysics scheme

Yan Hong,Yuzhu Wang,Xuanying Zhang,Xiaocong Wang,He Zhang,Jinrong Jiang
DOI: https://doi.org/10.1007/s11227-023-05360-7
IF: 3.3
2023-05-09
The Journal of Supercomputing
Abstract:The National Center for Atmospheric Research released a global atmosphere model named Community Atmosphere Model version 5.0 (CAM5), which aimed to provide a global climate simulation for meteorological research. Among them, the cloud microphysics scheme is extremely time-consuming, so developing efficient parallel algorithms faces large-scale and chronic simulation challenges. Due to the wide application of GPU in the fields of science and engineering and the NVIDIA’s mature and stable CUDA platform, we ported the code to GPU to accelerate computing. In this paper, by analyzing the parallelism of CAM5 cloud microphysical schemes (CAM5 CMS) in different dimensions, corresponding GPU-based one-dimensional (1D) and two-dimensional (2D) parallel acceleration algorithms are proposed. Among them, the 2D parallel algorithm exploits finer-grained parallelism. In addition, we present a data transfer optimization method between the CPU and GPU to further improve the overall performance. Finally, GPU version of the CAM5 CMS (GPU-CMS) was implemented. The GPU-CMS can obtain a speedup of 141.69×documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$ imes$$end{document} on a single NVIDIA A100 GPU with I/O transfer. In the case without I/O transfer, compared to the baseline performance on a single Intel Xeon E5-2680 CPU core, the 2D acceleration algorithm obtained a speedup of 48.75×documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$ imes$$end{document}, 280.11×documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$ imes$$end{document}, and 507.18×documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$ imes$$end{document} on a single NVIDIA K20, P100, and A100 GPU, respectively.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?