Estimating particulate matter concentrations and meteorological contributions in China during 2000–2020

Shuai Wang,Peng Wang,Ruhan Zhang,Xia Meng,Haidong Kan,Hongliang Zhang
DOI: https://doi.org/10.1016/j.chemosphere.2023.138742
IF: 8.8
2023-04-24
Chemosphere
Abstract:Estimating the effects of airborne particulate matter (PM) on climate and human health is highly dependent on the accurate prediction of its concentration and size distribution. High-complexity machine learning models have been widely used for PM concentration prediction, but such models are often considered as "black boxes", lacking interpretability. Here, a simple structure lightGBM model is built for ground PM estimation, and the SHAP approach is used to separate the meteorological contributions due to its strong influence on PM concentration. The models show good performance with correlation coefficient (R 2 ) of 0.84–0.88, 0.80–0.85, and 0.71–0.79, for PM 2.5 , PM 10 , and PM 2.5-10 (2.5–10 μm), respectively. The lightGBM model trains 45 times faster than the XGBoost model while showing similar accuracy. More importantly, the models have small performance gaps between training and predicting (delta R 2 : 0.07–0.12), effectively reducing overfitting risk. The PM datasets (10 km daily) of three size ranges are then generated over China from 2000 to 2020. The SHAP method shows good agreement with the meteorological normalization approach in separating the meteorological contributions (R 2 > 0.5). In the Beijing-Tianjin-Hebei region (BTH), meteorology has greater influence on PM 2.5-10 (−5.66%–9.99%) than PM 2.5 and PM 10 . In the Yangtze River Delta (YRD), and the Pearl River Delta (PRD), albedo has a large contribution to PM 2.5 concentration under the influence of solar radiation. Notably, relative humidity (RH) has different seasonal effects on PM of three size ranges. In the BTH region, RH has negative effects on PM 2.5 (−0.52 μg/m 3 ) and positive effects on PM 10 (1.01 μg/m 3 ) and PM 2.5-10 (3.39 μg/m 3 ) in spring, but has opposite effects in summer. The results of SHAP approach are consistent with existing conclusions and imply its feasibility in explaining haze formation. The generated PM datasets are useful in health assessment, environmental management, and climate change studies.
environmental sciences
What problem does this paper attempt to address?