Shaoqing Zhang,Haohuan Fu,Lixin Wu,Yuxuan Li,Hong Wang,Yunhui Zeng,Xiaohui Duan,Wubing Wan,Li Wang,Yuan Zhuang,Hongsong Meng,Kai Xu,Ping Xu,Lin Gan,Zhao Liu,Sihai Wu,Yuhu Chen,Haining Yu,Shupeng Shi,Lanning Wang,Shiming Xu,Wei Xue,Weiguo Liu,Qiang Guo,Jie Zhang,Guanghui Zhu,Yang Tu,Jim Edwards,Allison Baker,Jianlin Yong,Man Yuan,Yangyang Yu,Qiuying Zhang,Zedong Liu,Mingkui Li,Dongning Jia,Guangwen Yang,Zhiqiang Wei,Jingshan Pan,Ping Chang,Gokhan Danabasoglu,Stephen Yeager,Nan Rosenbloom,Ying Guo

Abstract:Abstract. With semiconductor technology gradually approaching its physical and thermal limits, recent supercomputers have adopted major architectural changes to continue increasing the performance through more power-efficient heterogeneous many-core systems. Examples include Sunway TaihuLight that has four management processing elements (MPEs) and 256 computing processing elements (CPEs) inside one processor and Summit that has two central processing units (CPUs) and six graphics processing units (GPUs) inside one node. Meanwhile, current high-resolution Earth system models that desperately require more computing power generally consist of millions of lines of legacy code developed for traditional homogeneous multicore processors and cannot automatically benefit from the advancement of supercomputer hardware. As a result, refactoring and optimizing the legacy models for new architectures become key challenges along the road of taking advantage of greener and faster supercomputers, providing better support for the global climate research community and contributing to the long-lasting societal task of addressing long-term climate change. This article reports the efforts of a large group in the International Laboratory for High-Resolution Earth System Prediction (iHESP) that was established by the cooperation of Qingdao Pilot National Laboratory for Marine Science and Technology (QNLM), Texas A&M University (TAMU), and the National Center for Atmospheric Research (NCAR), with the goal of enabling highly efficient simulations of the high-resolution (25 km atmosphere and 10 km ocean) Community Earth System Model (CESM-HR) on Sunway TaihuLight. The refactoring and optimizing efforts have improved the simulation speed of CESM-HR from 1 SYPD (simulation years per day) to 3.4 SYPD (with output disabled) and supported several hundred years of pre-industrial control simulations. With further strategies on deeper refactoring and optimizing for remaining computing hotspots, as well as redesigning architecture-oriented algorithms, we expect an equivalent or even better efficiency to be gained on the new platform than traditional homogeneous CPU platforms. The refactoring and optimizing processes detailed in this paper on the Sunway system should have implications for similar efforts on other heterogeneous many-core systems such as GPU-based high-performance computing (HPC) systems.

Refactoring and Optimizing WRF Model on Sunway TaihuLight

Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway Taihulight Supercomputer

Enabling Large-Scale Simulation of CAM on the Sunway TaihuLight Supercomputer

Analysis and MPI Implementation of LQCD Dslash on Sunway TaihuLight*

swHPFM: Refactoring and Optimizing the Structured Grid Fluid Mechanical Algorithm on the Sunway TaihuLight Supercomputer

26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight.

AutoWM: a Novel Domain-Specific Tool for Universal Multi-/Many-core Accelerations of the WRF Cloud Microphysics

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

Heterogeneous Parallel Algorithm Design and Performance Optimization for WENO on the Sunway TaihuLight Supercomputer

Optimizing high-resolution Community Earth System Model on a heterogeneous many-core supercomputing platform

Redesigning CAM-SE for Peta-Scale Climate Modeling Performance and Ultra-High Resolution on Sunway TaihuLight

Optimization Strategies for Multi‐block Structured CFD Simulation Based on Sunway TaihuLight

Redesigning LAMMPS for Peta-Scale and Hundred-Billion-atom Simulation on Sunway TaihuLight.

10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics.

Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer

A Highly Effective Global Surface Wave Numerical Simulation with Ultra-High Resolution

Communication Optimization Strategy for Molecular Dynamics Simulation on Sunway TaihuLight

Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2

Parallelization and Optimization of RMC for Criticality Computing Based on the Heterogeneous Architecture of the Sunway TaihuLight Supercomputer

Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling

Automatic Multi-Parameter Performance Modeling of HPC Applications on a New Sunway Supercomputer