Spatial prediction of PM 2 . 5 concentration using hyper-parameter optimization XGBoost model in China

Yingqiang Song,Changjian Zhang,Xin Jin,Xiaoyu Zhao,Wei Huang,Xiaoshuang Sun,Zhongkang Yang,Shuhuan Wang
DOI: https://doi.org/10.1016/j.eti.2023.103272
IF: 7.758
2023-11-01
Environmental Technology & Innovation
Abstract:High-fine particulate matter (PM 2 . 5 ) pollution has become the main object of damaging the atmospheric environment and endangering human health. Accurate prediction of the spatial variability of PM 2 . 5 concentrations using high-performance models is crucial for the prevention and control of atmospheric pollution. In this paper, in order to avoid the uncertainty caused by artificially setting the parameters of the prediction model, a new hyper-parameter optimization extreme gradient boosting (HPO-XGBoost) model for spatial prediction of PM 2 . 5 concentration was proposed. We used aerosol optical depth (AOD) data of Himawari-8 satellite combined with the HPO-XGBoost model to predict and mapping the spatial variability of PM 2 . 5 concentrations during the COVID-19 lockdown in China. The results showed that the maximum concentration of PM 2 . 5 exhibited a downward trend and a significant seasonal fluctuation from 2015 to 2020. Compared with grid search (GS) and random grid search (RGS) optimization algorithms, the tree-structured Parzen estimator-XGBoost (TPE-XGBoost) approach has the highest prediction accuracy (R2) of 89.37% (January), 85.58% (February), 80.02% (March), and 83.68% (April) for PM 2 . 5 concentrations in 2020, respectively. Spatial mapping of the monthly average PM 2 . 5 concentration from January to April using the TPE-XGBoost model showed that the regions with higher PM 2 . 5 concentrations are concentrated in northern China, eastern China and northeastern China, where PM 2 . 5 concentrations exceed 150 μ g m−3 in January. Furthermore, the spatial correlation analysis showed that factors with the highest driving effect of PM 2 . 5 concentrations during the COVID-19 lockdown were factory (FAC) and traffic (TRA), and the strongest contribution of the double-factor synergy (q > 0.8) for all four months included factory-residence (FAC-RES) and factory-population (FAC-POP). The contribution source of PM 2 . 5 transport showed a long-distance channel in Taiyuan, Xi’an and Shijiazhuang, while the cluster-like distribution and mixed distribution of contribution sources were embodied in Zhengzhou, Changchun and Harbin based on concentration-weighted trajectory (CWT) analysis. The above results showed that the TPE-XGBoost has strong nonlinear explanation and effective feasibility for spatial prediction of PM 2 . 5 concentrations, and the transport of PM 2 . 5 was dynamic effect which should be considered into account when developing prevention and control measures of atmospheric pollution in epidemic period.
environmental sciences,engineering, environmental,biotechnology & applied microbiology
What problem does this paper attempt to address?