Machine learning-based prediction of air quality index and air quality grade: a comparative analysis
S. A. Aram,E. A. Nketiah,B. M. Saalidong,H. Wang,A.-R. Afitiri,A. B. Akoto,P. O. Lartey
DOI: https://doi.org/10.1007/s13762-023-05016-2
2023-06-08
International Journal of Environmental Science and Technology
Abstract:The purpose of this study was to compare different machine learning models for predicting daily air quality index (AQI) and evaluating air quality grade (AQG). The study used publicly available data from 2014 to 2019 for six pollutants (PM 10 , PM 2.5 , NO 2 , SO 2 , CO, O 3 ). Four models (random forest (RF), gradient boosting (GB), Lasso Regression (LASSO), and the Stacked Regressor) were used for predicting AQI, while six models (K-Nearest Neighbors (KNN), support vector machines (SVM), decision tree (DT), multilayer perceptron (MLP), random forest (RF), and the Stacked Classifier) were used for forecasting AQG. The individual models were evaluated using different statistical measures, such as R -squared (R 2 ), root mean square error (RMSE), mean absolute error (MAE), accuracy score (ACC), Matthew's Correlation Coefficient (MCC), and F1 score. The study found that the stack model performed consistently across all metric scores for AQI prediction. The stack model had an R 2 score of 0.973, RMSE of 7.568, and MAE of 4.596, outperforming LASSO, GB, and RF. This indicates that the stack model was able to minimize the weaknesses of the individual models and provide a more accurate prediction. For AQG, the stack model also performed better across all metric scores, with an ACC of 0.970, MCC of 0.960, and F1 of 0.970, outperforming MLP, KNN, SVM, DT, and RF. The study concluded that stacked generalization machine learning models can be used for forecasting air quality index and grade with high efficiency and precision, mitigating the concerns of overfitting against individual models.
environmental sciences