Improving machine-learned surface NO 2 concentration mapping models with domain knowledge from data science perspective
Mengqian Hu,Kaixu Bai,Ke Li,Zhe Zheng,Yibing Sun,Liuqing Shao,Ruijie Li,Chaoshun Liu
DOI: https://doi.org/10.1016/j.atmosenv.2024.120372
IF: 5
2024-02-03
Atmospheric Environment
Abstract:Learning from big Earth data via supervised machine learning has become a popular approach for ambient environment air quality mapping. However, knowledge gap remains as which level satellite data should be used as the critical proxy variable, and how to improve data-driven models with domain knowledge are also still elusive. By taking surface NO 2 concentration mapping as illustration, here we performed inter-comparison studies between a set of machine-learned surface NO 2 concentration estimation models established with different levels of satellite products, varying from Level 1 (L1) apparent radiance from TROPOMI on board Sentinel-5p to Level 2 (L2) NO 2 slant column density (SCD) and tropospheric vertical column density (VCD). TROPOMI bands sensitive to surface NO 2 were firstly pinpointed via radiative transfer simulations while band ratios between nine sensitive and adjacent insensitive channels were then calculated and used as the counterpart of raw radiance observations. The results indicated that the prediction model trained with L1 band ratios at few discrete channels yielded higher prediction accuracy (R 2 = 0.71, RMSE = 7.98 μg m −3 ) than that using raw L1 radiance data at all available bands (R 2 = 0.68, RMSE = 8.40 μg m −3 ), largely benefiting from the improved signal-to-noise ratio and reduced model complexity due to fewer band ratio inputs. Yet even higher modeling accuracies were attained with L2 data products, the model with SCD (R 2 = 0.78, RMSE = 6.54 μg m −3 ) were found to perform even slightly better than that of VCD (R 2 = 0.77, RMSE = 6.79 μg m −3 ), though the latter is supposed to better correlate with surface NO 2 variations. The modeling accuracy was further improved with the inclusion of solar zenith angle, aerosol optical depth, surface albedo and pressure that are highly associated with air mass factor, with R 2 improved to 0.80 and RMSE reduced to 6.28 μg m −3 . Overall, our results not only provide actional guidance on satellite-based surface NO 2 concentration modeling but also underscore the critical importance of domain knowledge in improving machine-learned models to aid in large scale air quality surveillance.
environmental sciences,meteorology & atmospheric sciences