Productive Performance Engineering for Weather and Climate Modeling with Python
Tal Ben-Nun,Linus Groner,Florian Deconinck,Tobias Wicky,Eddie Davis,Johann Dahm,Oliver D. Elbert,Rhea George,Jeremy McGibbon,Lukas Trümper,Elynn Wu,Oliver Fuhrer,Thomas Schulthess,Torsten Hoefler
DOI: https://doi.org/10.48550/arXiv.2205.04148
2022-05-09
Distributed, Parallel, and Cluster Computing
Abstract:Earth system models are developed with a tight coupling to target hardware, often containing specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. We present a detailed account of optimizing the Finite Volume Cubed-Sphere Dynamical Core (FV3), improving productivity and performance. By using a declarative Python-embedded stencil domain-specific language and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing weather and climate applications. The workflow utilizes both local and full-program optimization, as well as user-guided fine-tuning. To prune the infeasible global optimization space, we automatically utilize repeating code motifs via a novel transfer tuning approach. On the Piz Daint supercomputer, we scale to 2,400 GPUs, achieving speedups of up to 3.92x over the tuned production implementation at a fraction of the original code.