Air quality index prediction based on three-stage feature engineering, model matching, and optimized ensemble

Yucheng Yin,Hui Liu
DOI: https://doi.org/10.1007/s11869-023-01380-7
2023-05-24
Abstract:A prompt and accurate prediction of air quality index (AQI) has become a necessity to tackle the mounting environmental threats. This paper proposes a feature-driven hybrid method for hourly, 3-step-ahead, and deterministic AQI prediction, which includes three modules. In Module 1, an "extract-merge-filter" procedure of feature engineering is created to capture the potential features from the AQI series. Ten feature sets are generated as candidates. In Module 2, six models including Light Gradient Boosting Machine, Extreme Gradient Boosting, Long Short-Term Memory, Convolutional Neural Network, Multilayer Perceptron, and Deep Neural Network are developed as base predictors and performed on the candidate features. In Module 3, predictors are first matched with their optimal features using a comprehensive metric, and then combined in an optimized ensemble using OPTUNA. A case study on the AQI data from four different Chinese cities is carried out to demonstrate the method. The experimental results show the following: (1) Feature engineering significantly boosts prediction performance and provides interpretable findings for practical use. (2) Customized input of features to the predictors is more effective than a fixed input and can rise the performance to a higher level. (3) OPTUNA is a promising tool for optimizing ensemble weights. The final ensemble model is superior to single machine learning models and has a good robustness.
environmental sciences
What problem does this paper attempt to address?