Construction of a virtual PM2.5 observation network in China based on high-density surface meteorological observations using the Extreme Gradient Boosting model

Ke Gui,Huizheng Che,Zhaoliang Zeng,Yaqiang Wang,Shixian Zhai,Zemin Wang,Ming Luo,Lei Zhang,Tingting Liao,Hujia Zhao,Lei Li,Yu Zheng,Xiaoye Zhang
DOI: https://doi.org/10.1016/j.envint.2020.105801
IF: 11.8
2020-08-01
Environment International
Abstract:<p>With increasing public concerns on air pollution in China, there is a demand for long-term continuous PM<sub>2.5</sub> datasets. However, it was not until the end of 2012 that China established a national PM<sub>2.5</sub> observation network. Before that, satellite-retrieved aerosol optical depth (AOD) was frequently used as a primary predictor to estimate surface PM<sub>2.5</sub>. Nevertheless, satellite-retrieved AOD often encounter incomplete daily coverage due to its sampling frequency and interferences from cloud, which greatly affect the representation of these AOD-based PM<sub>2.5</sub>. Here, we constructed a virtual ground-based PM<sub>2.5</sub> observation network at 1180 meteorological sites across China using the Extreme Gradient Boosting (XGBoost) model with high-density meteorological observations as major predictors. Cross-validation of the XGBoost model showed strong robustness and high accuracy in its estimation of the daily (monthly) PM<sub>2.5</sub> across China in 2018, with <em>R<sup>2</sup></em>, root-mean-square error (RMSE) and mean absolute error values of 0.79 (0.92), 15.75 μg/m<sup>3</sup> (6.75 μg/m<sup>3</sup>) and 9.89 μg/m<sup>3</sup> (4.53 μg/m<sup>3</sup>), respectively. Meanwhile, we find that surface visibility plays the dominant role in terms of the relative importance of variables in the XGBoost model, accounting for 39.3% of the overall importance.</p><p>We then use meteorological and PM<sub>2.5</sub> data in the year 2017 to assess the predictive capability of the model. Results showed that the XGBoost model is capable to accurately hindcast historical PM<sub>2.5</sub> at monthly (<em>R</em><sup>2</sup> = 0.80, RMSE = 14.75 μg/m<sup>3</sup>), seasonal (<em>R</em><sup>2</sup> = 0.86, RMSE = 12.28 μg/m<sup>3</sup>), and annual (<em>R</em><sup>2</sup> = 0.81, RMSE = 10.10 μg/m<sup>3</sup>) mean levels. In general, the newly constructed virtual PM<sub>2.5</sub> observation network based on high-density surface meteorological observations using the Extreme Gradient Boosting model shows great potential in reconstructing historical PM<sub>2.5</sub> at ~1000 meteorological sites across China. It will be of benefit to filling gaps in AOD-based PM<sub>2.5</sub> data, as well as to other environmental studies including epidemiology.</p>
environmental sciences
What problem does this paper attempt to address?