ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Juan Nathaniel,Yongquan Qu,Tung Nguyen,Sungduk Yu,Julius Busecke,Aditya Grover,Pierre Gentine
2024-09-28
Abstract:Accurate prediction of climate in the subseasonal-to-seasonal scale is crucial for disaster preparedness and robust decision making amidst climate change. Yet, forecasting beyond the weather timescale is challenging because it deals with problems other than initial condition, including boundary interaction, butterfly effect, and our inherent lack of physical understanding. At present, existing benchmarks tend to have shorter forecasting range of up-to 15 days, do not include a wide range of operational baselines, and lack physics-based constraints for explainability. Thus, we propose ChaosBench, a challenging benchmark to extend the predictability range of data-driven weather emulators to S2S timescale. First, ChaosBench is comprised of variables beyond the typical surface-atmospheric ERA5 to also include ocean, ice, and land reanalysis products that span over 45 years to allow for full Earth system emulation that respects boundary conditions. We also propose physics-based, in addition to deterministic and probabilistic metrics, to ensure a physically-consistent ensemble that accounts for butterfly effect. Furthermore, we evaluate on a diverse set of physics-based forecasts from four national weather agencies as baselines to our data-driven counterpart such as ViT/ClimaX, PanguWeather, GraphCast, and FourCastNetV2. Overall, we find methods originally developed for weather-scale applications fail on S2S task: their performance simply collapse to an unskilled climatology. Nonetheless, we outline and demonstrate several strategies that can extend the predictability range of existing weather emulators, including the use of ensembles, robust control of error propagation, and the use of physics-informed models. Our benchmark, datasets, and instructions are available at <a class="link-external link-https" href="https://leap-stc.github.io/ChaosBench" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in sub - seasonal - to - seasonal (S2S) climate prediction, existing prediction methods perform poorly in the time range exceeding 15 days. Specifically, current benchmarks usually only cover shorter prediction ranges (such as 1 - 15 days), lack extensive operational baselines, and have no physics - based constraints to improve the interpretability of models. Therefore, the authors propose a new benchmark named ChaosBench, aiming to expand the prediction range of data - driven weather simulators so that they can effectively cope with the challenges on the S2S time scale. ### Main problems 1. **Limited prediction range**: Existing benchmarks mainly focus on short - term (1 - 5 days), medium - term (5 - 15 days) and long - term (years to decades) prediction ranges, while the prediction on the S2S time scale (15 days to several months) is relatively less studied. 2. **Lack of physical constraints**: Existing benchmarks lack physics - based constraints, resulting in poor interpretability and physical consistency of models. 3. **Insufficient diversity of baseline models**: Most benchmarks mainly focus on increasing the number of data - driven models while ignoring the diversity of physical models. ### Solutions 1. **Multi - channel, physics - based benchmark**: ChaosBench includes not only typical surface - atmosphere variables (such as ERA5), but also reanalysis products of the ocean, ice and land, with a time span of more than 45 years, in order to achieve comprehensive earth - system simulation. 2. **Physical and probabilistic indicators**: Physics - based indicators (such as spectral divergence and spectral residual) as well as deterministic and probabilistic indicators are introduced to ensure the physical consistency and interpretability of models. 3. **Diverse baseline models**: 44 - day - ahead physical control (deterministic) and perturbed (ensemble) forecasts from four national meteorological agencies are provided as baselines, increasing the diversity of baseline models. ### Specific contributions 1. **Dataset**: A comprehensive dataset covering more than 45 years of multi - system observation data is provided, including 124 input variables and 124 target variables. 2. **Evaluation indicators**: A series of evaluation indicators are proposed, including deterministic indicators (such as RMSE, Bias, ACC, MS - SSIM), physics - based indicators (such as SpecDiv, SpecRes) and probabilistic indicators (such as CRPS, CRPSS, Spread/Skill Ratio). 3. **Experimental results**: Through extensive experiments, the performance of existing models on the S2S time scale has been verified. It is found that the performance of most models drops significantly in long - term prediction, but the prediction ability can be effectively improved through strategies such as ensemble forecasting, controlling error propagation and introducing physical knowledge. ### Conclusion ChaosBench provides a comprehensive, multi - channel, physics - based benchmark platform for S2S time - scale climate prediction, aiming to promote the further development of data - driven and physics - based models in this field. Future work will include improving the spatial and temporal resolution of data and integrating multi - source reanalysis products to further enhance the prediction ability.