Spatiotemporal ozone pollution LUR models: Suitable statistical algorithms and time scales for a megacity scale

Jiawei Wang,Daniel S. Cohan,He Xu
DOI: https://doi.org/10.1016/j.atmosenv.2020.117671
IF: 5
2020-09-01
Atmospheric Environment
Abstract:<p>Ambient air ozone (O<sub>3</sub>), a secondary photochemical pollutant, is seriously harmful to human health. Accurate estimation of O<sub>3</sub> exposure requires the ability to monitor O<sub>3</sub> surface concentration with a high spatiotemporal resolution. Several spatiotemporal land use regression (LUR) models have integrated meteorological factors based on different statistical algorithms to support such epidemiological studies. From among such various existing statistical algorithms, we aim to identify a high-efficiency modeling method, as well as the most suitable lengths of the modeling period (time scale). Three types of typical spatiotemporal LUR models based on parametric, semi-parametric, and non-parametric statistic methods, respectively, are considered to predict daily ground-level O<sub>3</sub> in the megacity of Tianjin, China. Based on monthly, seasonal (cold and warm), and annual time scales, these models include: a series of monthly hybrid LUR (Two-stage) models consisting of two sub-models based on the multiple linear regression (MLR) algorithm, general additive mixed models (GAMMs), and land use random forest (LURF) models. Leave-one-out cross-validation was performed to evaluate the temporal and spatial predictive accuracy of each model using the adjusted coefficient of determination (adjR<sup>2</sup><sub>CV</sub>) and root mean square error (RMSE<sub>CV</sub>). In the GAMMs and LURF models, models using a shorter time scale (monthly models) outperformed those using a longer one. In monthly models, the GAMMs performed the best, with the highest average adjR<sup>2</sup><sub>CV</sub> (0.747) and the lowest average RMSE<sub>CV</sub> (15.721 μg/m<sup>3</sup>), followed by the LURF models (average adjR<sup>2</sup><sub>CV</sub> = 0.695, average RMSE<sub>CV</sub> = 16.405), and the Two-stage models (average adjR<sup>2</sup><sub>CV</sub> = 0.466, average RMSE<sub>CV</sub> = 23.934). Thus, the modeling format consisting of a shorter time scale and the GAMM algorithm performs relatively well in predicting daily O<sub>3</sub> pollution on a megacity scale. These findings can be used to select appropriate modeling methods for epidemiological research of O<sub>3</sub> pollution.</p>
environmental sciences,meteorology & atmospheric sciences
What problem does this paper attempt to address?