Determination of major drive of ozone formation and improvement of O 3 prediction in typical North China Plain based on interpretable random forest model

Liyin Yao,Yan Han,Xin Qi,Dasheng Huang,Hanxiong Che,Xin Long,Yang Du,Lingshuo Meng,Xiaojiang Yao,Liuyi Zhang,Yang Chen
DOI: https://doi.org/10.1016/j.scitotenv.2024.173193
IF: 9.8
2024-05-23
The Science of The Total Environment
Abstract:O 3 pollution in China has become prominent in recent years, and it has become one of the most challenging issues in air pollution control. We used data on atmospheric pollutants and meteorology from 2019 to 2021 to build an interpretable random forest (RF) model, applying this model to predict O 3 concentration in 2022 in five cities in the Southwest North China Plain. The model was also used to identify and explain the influence of various factors on O 3 formation. The correlation coefficient R 2 between the predicted O 3 concentration and observed O 3 concentration was 0.82, the MAE was 15.15 μg/m 3 , and the RMSE was 20.29 μg/m 3 , indicating that the model can effectively predict O 3 concentration in the studying area. The results of correlation analysis, feature importance, and the driving factor analysis from SHapley Additive exPlanations (SHAP) model indicated that temperature (T), NO 2 , and relative humidity (RH) are the top three features affecting O 3 prediction, while the weights of wind speed and wind direction were relatively low. Thus, O 3 in the southwestern North China Plain may mainly come from the formation of local photochemical activities. The dominant factors behind O 3 also varied in different seasons. In spring and autumn, O 3 pollution is more likely to occur under high NO 2 concentration and high-temperature conditions, while in summer, it is more likely to occur under high-temperature and precipitation-free weather. In winter, NO 2 is the dominant factor in O 3 formation. Finally, the interpretable RF model is used to predict future O 3 concentration based on features provided by Community Multiscale Air Quality (CMAQ) and Weather Research & Forecast (WRF) model, and the simulation performance of CMAQ on O 3 concentration is enhanced to a certain extent, improving the prediction of future O 3 pollution situations and guiding pollution control.
environmental sciences
What problem does this paper attempt to address?