accelerating wrf i/o performance with adios2 and network-based streaming

Erick Fredj,Yann Delorme,Sameeh Jubran,Mark Wasserman,Zhaohui Ding,Michael Laufer
DOI: https://doi.org/10.48550/arXiv.2304.06603
2023-04-13
Abstract:With the approach of Exascale computing power for large-scale High Performance Computing (HPC) clusters, the gap between compute capabilities and storage systems is growing larger. This is particularly problematic for the Weather Research and Forecasting Model (WRF), a widely-used HPC application for high-resolution forecasting and research that produces sizable datasets, especially when analyzing transient weather phenomena. Despite this issue, the I/O modules within WRF have not been updated in the past ten years, resulting in subpar parallel I/O performance. This research paper demonstrates the positive impact of integrating ADIOS2, a next-generation parallel I/O framework, as a new I/O backend option in WRF. It goes into detail about the challenges encountered during the integration process and how they were addressed. The resulting I/O times show an over tenfold improvement when using ADIOS2 compared to traditional MPI-I/O based solutions. Furthermore, the study highlights the new features available to WRF users worldwide, such as the Sustainable Staging Transport (SST) enabling Unified Communication X (UCX) DataTransport, the node-local burst buffer write capabilities and in-line lossless compression capabilities of ADIOS2. Additionally, the research shows how ADIOS2's in-situ analysis capabilities can be smoothly integrated with a simple WRF forecasting pipeline, resulting in a significant improvement in overall time to solution. This study serves as a reminder to legacy HPC applications that incorporating modern libraries and tools can lead to considerable performance enhancements with minimal changes to the core application.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?