WeatherBench 2: A benchmark for the next generation of data-driven global weather models

Stephan Rasp,Stephan Hoyer,Alexander Merose,Ian Langmore,Peter Battaglia,Tyler Russel,Alvaro Sanchez-Gonzalez,Vivian Yang,Rob Carver,Shreya Agrawal,Matthew Chantry,Zied Ben Bouallegue,Peter Dueben,Carla Bromberg,Jared Sisk,Luke Barrington,Aaron Bell,Fei Sha
2024-01-26
Abstract:WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting.
Atmospheric and Oceanic Physics,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the improvement of global medium - range (1 - 14 days) numerical weather prediction (NWP). Although current physical models have made remarkable progress in terms of increasing resolution, the number of ensemble forecasts, better observational data assimilation techniques, and more accurate representation of physical processes, there is still much room for improvement. According to recent research estimates, the intrinsic prediction limit for mid - latitude weather is approximately 15 days, while the current practical prediction limit is about 10 days. Half of the potential improvement in the remaining 5 days comes from model improvement, and the other half comes from the improvement of initial conditions. To accelerate progress in this field, the paper proposes an updated benchmark - WeatherBench 2 (WB2), aiming to promote the development of data - driven weather modeling. WB2 not only supports higher - resolution data and evaluation, but also adds additional evaluation metrics. The paper describes in detail the design principles, evaluation metrics, and data sets of WB2, and presents the results of the current state - of - the - art physical and data - driven weather models. In addition, the paper also discusses the limitations in the current evaluation settings and the challenges faced by future data - driven weather forecasts. Specifically, the goals of WB2 include: 1. **Provide an open - source evaluation framework**, including publicly available training, real - data, and baseline data, as well as a continuously updated website showing the latest metrics and state - of - the - art models. 2. **Define a set of core scores** to outline model performance while discussing the limitations of these scores. 3. **Emphasize the importance of probabilistic prediction**, because weather forecasting is essentially an uncertain process and reliable decision - making information needs to be provided through probabilistic prediction. 4. **Provide a dynamic open - source framework** so that it can continuously evolve with the development of the ML - weather community's needs. Through these goals, WB2 hopes to promote the development of data - driven weather models so that they can better serve social and economic needs.