Data Imputation of Omi No2 by Combining Multi-Source Data Through a 2-Step Machine Learning Method over China, 2007-2020

Wei Zhao,Riyang Liu,Yanchuan Shao,Weihan Li,Jun Bi,Zongwei Ma
DOI: https://doi.org/10.2139/ssrn.4060131
2022-01-01
SSRN Electronic Journal
Abstract:The Ozone Monitoring Instrument (OMI) aboard NASA's EOS-Aura satellite provides tropospheric NO 2 vertical column densities (VCDs) with global spatial coverage (13km×24km) and a high temporal resolution (daily), which is widely used in ground-level NO 2 estimation. However, the extensive non-random data missing of OMI NO 2 caused by “row anomaly” failures and cloud reflections severely affected the data quality and hindered relevant research. In this study, we innovatively adopted a two-step machine learning method to impute missing OMI NO 2 across China. In the first step, we used tropospheric NO 2 from the Second Global Ozone Monitoring Experiment (GOME-2), another satellite data product with similar operating parameters to OMI, as predictors to establish the OMI-GOME-2 combination model, which increased the data volume by 15%-40% and achieved higher spatial coverage. In the second step, a 3day-randomforest (3d-RF) data imputation model was built to interpolate the data to the entire study area. The two-step model achieved robust performance with cross-validation R 2 of 0.77-0.85 and 0.73-0.78, respectively. Finally, the high-resolution (0.1° × 0.1°) tropospheric NO 2 dataset with full spatial coverage in China during 2007-2020 was obtained which is expected to contribute to subsequent ground-level concentration estimation and environmental health research.
What problem does this paper attempt to address?