Estimation of daily NO 2 with explainable machine learning model in China, 2007–2020

Yanchuan Shao,Wei Zhao,Riyang Liu,Jianxun Yang,Miaomiao Liu,Wen Fang,Litiao Hu,Matthew Adams,Jun Bi,Zongwei Ma
DOI: https://doi.org/10.1016/j.atmosenv.2023.120111
IF: 5
2023-09-27
Atmospheric Environment
Abstract:Surface nitrogen dioxide (NO 2 ) is an effective indicator of anthropogenic combustion and is associated with regional burden of disease. Though satellite-borne column NO 2 is widely used to acquire surface concentration through the integration of sophisticated models, long-term and full-coverage estimation is hindered by the incomplete retrieval of satellite data. Moreover, the mechanical relationship between surface and tropospheric NO 2 is often ignored in the context of machine learning (ML) approach. Here we develop a gap-filling method to obtain full-coverage column NO 2 by fusing satellite data from different sources. The surface NO 2 is then estimated during 2007–2020 in China using the XGBoost model, with daily out-of-sample cross-validation (CV) R 2 of 0.75 and root-mean-square error (RMSE) of 9.11 μg/m 3 . The back-extrapolation performance is verified through by-year CV (daily R 2 = 0.60 and RMSE = 11.46 μg/m 3 ) and external estimations in Taiwan before 2013 (daily R 2 = 0.69 and RMSE = 8.59 μg/m 3 ). We explore the variable impacts in three hotspots of eastern China through SHAP (Shapley additive explanation) values. We find the driving contributions of column NO 2 to the variation of ground pollution during 2007–2020 (average SHAP = 5.09 μg/m 3 compared with the baseline concentration of 33.39 μg/m 3 ). The estimated effect is also compared with ordinary least squares (OLS) model to provide a straightforward understanding. We demonstrate the employment of explainable ML model is beneficial to comprehend the coupled relationship in surface NO 2 change.
environmental sciences,meteorology & atmospheric sciences
What problem does this paper attempt to address?