A review of machine learning for modeling air quality: Overlooked but important issues

Die Tang,Yu Zhan,Fumo Yang
DOI: https://doi.org/10.1016/j.atmosres.2024.107261
IF: 5.965
2024-01-23
Atmospheric Research
Abstract:Machine learning models based on satellite remote sensing have gained widespread use in estimating ground-level air pollutant concentrations, which overcome the limitations of the discontinuous spatial distribution of ground monitoring stations. However, due to the interdisciplinary nature of environmental modeling, atmospheric researchers may overlook some important issues when using machine learning. In this review, we summarize and discuss the overlooked but important issues in data preparation, model development, validation, and prediction, including feature engineering, imbalanced data, validation strategy, and model interpretation, which are critical for model generalizability. Firstly, we provide considerations and recommendations in obtaining, selecting, and using data of the main variables in machine learning for air quality mapping. Secondly, sufficient introduction and discussion are provided on using feature engineering and addressing imbalanced data, which can enhance data representativeness and improve model performance during model development. Thirdly, we analyze and compare model validation strategies, and give suggestions on their applicable situations. Finally, we propose that placing importance on model interpretation in model development and prediction can guide model improvements. We reviewed several commonly used model interpretation methods, elucidated the interpretation scope, and advanced the application in model diagnostics. Corresponding to these issues, this review provides in-depth and practical guidance on applying machine learning for robust air quality mapping.
meteorology & atmospheric sciences
What problem does this paper attempt to address?