An Automatic Performance Model-Based Scheduling Tool for Coupled Climate System Models
Ding Nan,Xue Wei,Song Zhenya,Fu Haohuan,Xu Shiming,Zheng Weimin
DOI: https://doi.org/10.1016/j.jpdc.2018.01.002
IF: 4.542
2019-01-01
Journal of Parallel and Distributed Computing
Abstract:The prediction ability of the climate system is highly depended on the efficient integration of observations and simulations of the Earth, which is regarded as a canonical example of the cyber–physical system. The climate system model, the simulation engine in this cyber–physical system, is one of most challenging applications in scientific computing. It utilizes the multi-physics simulation that couples multiple components, conducts decadal to millennium simulations, and has long been an important application on supercomputers. However, current climate system models suffer from the inefficient task scheduling methods resulting in an intolerable simulation time. Take the Community Earth System Model (CESM), the most widely used climate system model, as an example, one major reason that CESM suffers from bad performances is the huge overhead to rationally distribute processes among the coupled heterogeneous components. According to the report of NCAR, every percent improvement in CESM performance frees up to the equivalent of $250,000 in computing resources in their scientific experiments. To address such challenge, our paper first constructs a lightweight and accurate performance model for effectively capturing and predicting the heterogeneous time-to-solution performance of end-to-end CESM components with a given simulation configuration. Then, based on the performance model, we further propose an efficient scheduling strategy based on rectangular packing method to determine the best process layout among different components, and the process numbers assigned to each component. Our evaluations show that we can achieve 58% average run time reductions on CESM comparing to the widely used sequential process layout for a scale of 144–480 cores on typical CPU clusters. And we can save 4 million CPU hours when we conduct one standard scientific experiment (a 2870-year simulation), which equals to save $40,089 with a charge of $0.01 per CPU hour. Meanwhile, 26% extra performance improvements also could be gained in our methods comparing to the heuristic branch and bound algorithm with the guidance of the known curve-fitting performance model.