WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models

Weixin Jin,Jonathan Weyn,Pengcheng Zhao,Siqi Xiang,Jiang Bian,Zuliang Fang,Haiyu Dong,Hongyu Sun,Kit Thambiratnam,Qi Zhang
2024-09-14
Abstract:In recent years, AI-based weather forecasting models have matched or even outperformed numerical weather prediction systems. However, most of these models have been trained and evaluated on reanalysis datasets like ERA5. These datasets, being products of numerical models, often diverge substantially from actual observations in some crucial variables like near-surface temperature, wind, precipitation and clouds - parameters that hold significant public interest. To address this divergence, we introduce WeatherReal, a novel benchmark dataset for weather forecasting, derived from global near-surface in-situ observations. WeatherReal also features a publicly accessible quality control and evaluation framework. This paper details the sources and processing methodologies underlying the dataset, and further illustrates the advantage of in-situ observations in capturing hyper-local and extreme weather through comparative analyses and case studies. Using WeatherReal, we evaluated several data-driven models and compared them with leading numerical models. Our work aims to advance the AI-based weather forecasting research towards a more application-focused and operation-ready approach.
Atmospheric and Oceanic Physics,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations in practical applications of current data - driven weather forecasting models trained and evaluated based on reanalysis data (such as ERA5). Specifically: 1. **Deviations between reanalysis data and actual observations**: Although reanalysis data performs well in simulating atmospheric dynamics and physical processes, there are significant differences in some key variables (such as near - surface temperature, wind speed, precipitation, and cloud cover) from actual observations, and these variables are of great significance to the public. 2. **Deficiencies of reanalysis data in extreme climate events**: Studies have shown that ERA5 has limited ability to represent extreme climate events such as cold fronts, heat waves, and heavy rainfall. 3. **Spatial resolution limitations of reanalysis data**: Although the resolution of reanalysis products is constantly increasing, their near - surface variables essentially represent the average conditions within the grid interval, and inevitably produce errors when applied to specific local locations. 4. **Impact of initial condition switching**: Models trained based on reanalysis data need to switch to the analysis fields provided by numerical models as initial conditions in actual operations, which further affects the performance of the models. The 12 - hour assimilation window design of ERA5 contains future information, which does not conform to the real situation. To overcome the above problems, the paper introduces **WeatherReal**, a new benchmark dataset based on global near - surface in - situ observation data. WeatherReal not only provides high - quality observation data, but also includes an open - access quality control and evaluation framework, aiming to evaluate weather forecasting models by directly using in - situ observation data, thereby increasing the practical application value and operational readiness of the models. ### Main contributions 1. **Providing a unified, reliable, and easily accessible benchmark**: WeatherReal provides a unified standard for researching and applying weather models, enabling direct comparison between different models. 2. **Emphasizing the importance of actual observations**: By directly using near - surface in - situ observation data, WeatherReal highlights the practical value of these models in people's daily lives. ### Datasets and quality control - **WeatherReal - ISD**: Based on the publicly available Integrated Surface Database (ISD), after strict post - processing and quality control. - **WeatherReal - Synoptic**: From Synoptic Data PBC, covering more than 150,000 ground - based meteorological stations worldwide, providing a denser observation network. - **MSN Weather user reports**: Weather reports collected from MSN Weather users, directly reflecting users' weather perception. ### Quality control algorithms - **Value range check**: Set absolute limits according to the global extreme values of the World Meteorological Organization (WMO), and exclude records outside the specified range. - **Distribution gap check**: Detect and mark outliers by comparing with ERA5 data. - **Time - series check**: Detect and mark errors of sudden changes (spikes) and long - time continuous same values (persistence). - **Cross - variable check**: Ensure the logical consistency between variables, for example, the dew - point temperature should not be higher than the air temperature, the consistency of wind speed and direction, and the consistency of precipitation accumulation. Through these methods, WeatherReal aims to provide a more reliable and practical benchmark for the research and application of weather forecasting models.