Refactoring and Optimizing WRF Model on Sunway TaihuLight

Kai Xu,Zhenya Song,Yuandong Chan,Shida Wang,Xiangxu Meng,Weiguo Liu,Wei Xue
DOI: https://doi.org/10.1145/3337821.3337923
2019-01-01
Abstract:The Weather Research and Forecasting (WRF) Model is one of the widely-used mesoscale numerical weather prediction system and is designed for both atmospheric research and operational forecasting applications. However, it is an extremely time-consuming application: running a single simulation takes researchers days to weeks as the simulation size scales up and computing demands grow. In this paper, we port and optimize the whole WRF model to the Sunway TaihuLight supercomputer at a large scale. For the dynamic core in WRF, we present a domain-specific tool, namely, SWSLL, which is a directive-based compiler tool for the Sunway many-core architecture to convert the stencil computation into optimized parallel code. We also apply a decomposition strategy for SWSLL to improve the memory locality and decrease the number of off-chip memory accesses. For physical parameterizations, we explore the thread-level parallelization using OpenACC directives via reorganizations of data layouts and loops to achieve high performance. We present the algorithms and implementations and demonstrate the optimizations of a real-world complicated atmospheric modeling on the Sunway TaihuLight supercomputer. Evaluation results reveal that for the widely used benchmark with a horizontal resolution of 2.5 km, the speedup of 4.7 can be achieved by using the proposed algorithm and optimization strategies for the whole WRF model. In terms of strong scalability, our implementation scales well to hundreds of thousands of heterogeneous cores on Sunway TaihuLight.
What problem does this paper attempt to address?