Evaluating traditional versus ensemble machine learning methods for predicting missing data of daily PM 10 concentration
Elham Kalantari,Hamid Gholami,Hossein Malakooti,Mahdi Eftekhari,Poorya Saneei,Donya Esfandiarpour,Vahid Moosavi,Ali Reza Nafarzadegan
DOI: https://doi.org/10.1016/j.apr.2024.102063
IF: 4.831
2024-01-29
Atmospheric Pollution Research
Abstract:The aim of this study was to predict the missing data of PM 10 for the city of Zabol using various traditional learning methods, Lazy Learning, and Ensemble Learning. In this study, daily minimum, average, and maximum data of weather variables were collected, along with daily PM 10 concentration from the Zabol airport weather station during the years 2013–2022. To compare the performance of the predictive models, R 2 , mean absolute error (MAE), and mean squared error (MSE) criteria were used. The reconstruction results show that collective learning models, especially XGBoost, can be effectively used to predict missing PM 10 data in time series. Additionally, among ensemble learning methods, boosting algorithms provide higher accuracy in predicting missing PM 10 data than packing algorithms. It was also found that, according to the results, among the traditional learning methods, lazy learning models performed better than eager learning models. In order of efficiency and accuracy for predicting PM 10 missing data, the models include XGBoost, random forest (RF), Extra Trees (ET), Light gradient boosting machine (GBM), The Decision Tree regressor with the Bagging method, gradient boosting (GB), Ada Boost, Weighted K-Nearest Neighbor (WKNN), K-Nearest Neighbor (KNN), The Decision Tree Regressor with the Pasting method, artificial neural network (ANN), Decision Tree (DT), and linear regression (LR). In general, given the high processing capability and potential of collective learning methods in the field of predicting missing PM 10 data, this technique is considered a useful solution for saving time, energy, and costs of collecting and measuring data. It can also replace missing data in the case of any equipment malfunction or damage. This approach can also be used to predict pollutant concentrations in weather systems.
environmental sciences