Comparative analysis of ensemble learning algorithms in water quality prediction

Farman Ullah Shah,Afed Ullah Khan,Abdul Waris Khan,Basir Ullah,Muhammad Rashid Khan,Ihrar Javed
DOI: https://doi.org/10.2166/hydro.2024.071
IF: 3.058
2024-12-04
Journal of Hydroinformatics
Abstract:Water is an essential resource necessary for the survival of all life forms, yet it is continually at risk of contamination. Accurate water quality prediction is essential for protecting ecosystem health. This study aims to assess the effectiveness of ensemble learning techniques, namely AdaBoost, gradient boosting, XGBoost, CatBoost, and LightGBM, in predicting water quality parameters in the Bara River Basin, Pakistan. Initially, a random forest model was used to determine the input water quality parameters combination for the selected target water quality variable. Then, the ML models were developed for each combination of input parameters and target water quality variables. The ML model's performance was assessed via statistical performance indicators, namely R2, mean squared error, and mean absolute error. The most suitable model was highlighted using compromise programming. The results reveal that the XGBoost and gradient boosting models outperform other algorithms based on statistical indicators, displaying remarkable predictive ability with near-perfect R2 values for HCO3, CO3, and Mg on the XGBoost model and electrical conductivity, SO4, Temp, and Ca on the gradient boosting model. Whereas CatBoost and LightGBM have a more robust performance on some parameters, such as pH and dissolved solids while its performance in other water quality parameters was weak.
environmental sciences,computer science, interdisciplinary applications,engineering, civil,water resources
What problem does this paper attempt to address?