VAR-tree model based spatio-temporal characterization and prediction of O3 concentration in China

Hongbin Dai,Guangqiu Huang,Jingjing Wang,Huibin Zeng
DOI: https://doi.org/10.1016/j.ecoenv.2023.114960
IF: 7.129
2023-06-01
Ecotoxicology and Environmental Safety
Abstract:Ozone (O<sub>3</sub>) pollution in the atmosphere is getting worse in many cities. In order to improve the accuracy of O<sub>3</sub> prediction and obtain the spatial distribution of O<sub>3</sub> concentration over a continuous period of time, this paper proposes a VAR-XGBoost model based on Vector autoregression (VAR), Kriging method and XGBoost (Extreme Gradient Boosting). China is used as an example and its spatial distribution of O<sub>3</sub> is simulated. In this paper, the O<sub>3</sub> concentration data of the monitoring sites in China are obtained, and then a spatial prediction method of O<sub>3</sub> mass concentration based on the VAR-XGBoost model is established, and finnally its influencing factors are analyzed. This paper concludes that O<sub>3</sub> features the highest correlation with PM<sub>2.5</sub> and the lowest correlation with SO<sub>2</sub>. Among the measurement factors, wind speed and temperature are the most important factors affecting O<sub>3</sub> pollution, which are positively correlated to O<sub>3</sub> pollution. In addition, precipitation is negatively correlated with 8-hour ozone concentration. In this paper, the performance of the VAR-XGBoost model is evaluated based on the ten-fold cross-validation method of sample, site and time, and a comparison with the results of XGBoost, CatBoost (categorical boosting), ExtraTrees, GBDT (gradient boosting decision tree), AdaBoost (adaptive boosting), RF (random forest), Decision tree, and LightGBM (light gradient boosting machine) models is conducted. The result shows that the prediction accuracy of the VAR-XGBoost model is better than other models. The seasonal and annual average R<sup>2</sup> reaches 0.94 (spring), 0.93 (summer), 0.92 (autumn), 0.93 (winter), and 0.95 (average from 2016 to 2021). The data show that the applicability of the VAR-XGBoost model in simulating the spatial distribution of O<sub>3</sub> concentrations in China performs well. The spatial distribution of O<sub>3</sub> concentrations in the Chinese region shows an obvious feature of high in the east and low in the west, and the spatial distribution is strongly influenced by topographical factors. The mean concentration is clearly low in winter and high in summer within a season. The results of this study can provide a scientific basis for the prevention and control of regional O<sub>3</sub> pollution in China, and can also provide new ideas for the acquisition of data on the spatial distribution of O<sub>3</sub> concentrations within cities.
environmental sciences,toxicology
What problem does this paper attempt to address?