Post-processing for NWP Outputs Based on Machine Learning for 2022 Winter Olympics Games over Complex Terrain

Kang Yanyan,Li Haochen,Xia Jiangjiang,Zhang Yingxin
DOI: https://doi.org/10.5194/egusphere-egu2020-10463
2020-01-01
Abstract:Weather forecasts play an important role in the Olympic game,especially the mountain snow projects, which will help to find a "window period" for the game. The winter Olympics track is located on very complex terrain, and a detailed weather forecast is needed. A Post-processing method based on machine learning is used for the future-10-days weather prediction with 1-km spatial resolution and 1-hour temporal resolution, which can greatly improve accuracy and refinement of numerical weather prediction(NWP). The ECWMF/RMAPS model data and the automatic weather station data(AWS) from 2015-2018 are prepared for the training data and test data, included 48 features and 4 labels (the observed 2m temperature, relative humidity , 10m wind speed and wind direction ). The model data are grid point, while the AWS data are station point. We take the nearest 9 model point to predict the station point, instead of making an interpolation between the grid point and station point. Then the feature number will be 48*9 in dataset. The interpolation error from grid point to station is eliminated,and the spatial distribution is considered to some extent. Machine leaning method we used are SVM, Random Forest, Gradient Boosting Decision Tree(GBDT) and XGBoost. We find that XGBoost method performs best, slightly better than GBDT and Random Forest. It is noted that we did some feature engineering work before training, and we found that it’s not that the more features, the better the model, while 10 features are enough. Also there is an interesting thing that the features that closely related the labels values becomes less important as the forecast time increases,such as the model outputed 2m temperature, 10m wind speed and wind direction. While some features that forecasters don’t pay attention to become more important in the 6-10 days prediction, such as latent heat flux, snow depth and so on. So it’s necessary to train the model based on dynamic weight parameters for different forecast time. Through the post-processing based on the machine learning method, the forecast accuracy has been greatly improved compared with EC model. The averaged forecast accuracy of 0-10 days for 2m relative humidity, 10m wind speed and direction has been increased by almost 15%, and the temperature accuracy has been increased by 20%~40% ( 40% for 0-3 days, and the accuracy decreased with the forecast time ).
What problem does this paper attempt to address?