Extracting Regional and Temporal Features to Improve Machine Learning for Hourly Air Pollutants in Urban India

Shuai Wang,Mengyuan Zhang,Hui Zhao,Peng Wang,Sri Harsha Kota,Qingyan Fu,Hongliang Zhang
DOI: https://doi.org/10.1016/j.atmosenv.2024.120834
IF: 5
2024-01-01
Atmospheric Environment
Abstract:India is suffering from severe particulate matter (PM, including PM2.5 and PM10) pollution, while limited ground observations are insufficient to support a comprehensive understanding of its health risks. Machine learning (ML) has the potential to improve the estimation of PM distribution and exposure efficiently. Regional transport as well as accumulation and dispersion processes of PM and its components, which have significant impacts on PM concentrations, are crucial when building ML models, especially for sparsely observed regions like India. Here, geographic and temporal-rolling weighting methods were used to separately extract regional and temporal features for improving the performance of the ML model. The incorporation of temporal and regional features into the ML model significantly improved ML model performance, with root mean square error (RMSE) reduced by 21 % and 19% for PM2.5 and PM10 estimation, as well as an improvement in model underestimation for the heavy pollution scenarios. The spatial-temporal model shows out-of-sample test CV coefficients of determination (R-2) of 0.87 and 0.88 for hourly PM2.5 and PM10. The ML model predicts an annual nationwide concentration of 68.3 mu g/m(3) for PM2.5 with a north (high, especially in Indo-Gangetic Plain) to south (low) distribution, which is consistent with high satellite aerosol optical depth (AOD) values. Boundary layer height is identified as the main meteorological factor influencing PM2.5 concentrations in winter. Characterizing the regional transport and cumulative dispersion processes of pollutants by extracting features can help in machine learning training, and this method can be further improved and applied to other studies.
What problem does this paper attempt to address?