Joint Features Random Forest (JFRF) Model for Mapping Hourly Surface PM2.5 over China

Lechao Dong,Siwei Li,Jia Xing,Hao Lin,Shansi Wang,Xiaoyue Zeng,Yaming Qin
DOI: https://doi.org/10.1016/j.atmosenv.2022.118969
IF: 5
2022-01-01
Atmospheric Environment
Abstract:Ambient PM2.5 exerts strong regional pattern for its ability for long-range transport, implying that including the features of surrounding stations may improve the accuracy of machine-learning based model to estimate surface PM2.5 from the satellite-retrieved aerosol optical depth (AOD). However, most of current models either just use single point features, or simply average the observed surface PM2.5 from adjacent stations based on a fixed spatial proportional relationship. The question that how to properly take advantage of the features of surrounding stations for retrieving PM2.5 is still not well addressed. Here we propose an integrated algorithm called joint features random forest (JFRF) model which includes complex feature differences with surrounding stations and the observation of stations to learn the dynamic relations with the PM2.5 of target pixel, rather than the weighted average feature (WAF) only by surface PM2.5 as traditional models (with WAF) used. Results of cross validation suggest better performance of JFRF (R-2 = 0.61-0.8; RMSE = 15.97-20.91 mu g/m(3)) than single point feature model (delta R-2 = 0.09-0.3). JFRF also exhibits better performance than traditional models (with WAF) (delta R-2 = 0.05-0.11), particularly in regions with large AOD gradient (accounts for 33% of the total test set), which is of great significance for accurately representing the spatial heterogeneity of PM2.5 (e.g., pollution edging and hot spots areas). And the exclusion of AOD from the features significantly reduced the model performance (delta R-2 =-0.07 similar to-0.1). Therefore, our study demonstrates the important of the feature differences of surrounding stations and satellite-retrieved AOD in representing the regional pattern of PM2.5 and further helping the machine-learning based model to improve the accuracy in estimating surface PM2.5.
What problem does this paper attempt to address?