Monitoring the Industrial waste polluted stream - Integrated analytics and machine learning for water quality index assessment
Ujala Ejaz,Shujaul Mulk Khan,Sadia Jehangir,Zeeshan Ahmad,Abdullah Abdullah,Majid Iqbal,Noreen Khalid,Aisha Nazir,Jens-Christian Svenning
DOI: https://doi.org/10.1016/j.jclepro.2024.141877
IF: 11.1
2024-04-05
Journal of Cleaner Production
Abstract:The Water Quality Index (WQI) is a primary metric used to evaluate and categorize surface water quality which plays a crucial role in the management of fresh water resources. Machine Learning (ML) modeling offers potential insights into water quality index prediction. This study employed advanced ML models to get potential insights into the prediction of water quality index for the Aik-Stream, an industrially polluted natural water resource in Pakistan with 19 input water quality variables aligning them with surrounding land use and anthropogenic activities. Six machine learning algorithms, i.e. Adaptive Boosting (AdaBoost), K-Nearest Neighbors (K-NN), Gradient Boosting (GB), Random Forests (RF), Support Vector Regression (SVR), and Bayesian Regression (BR) were employed as benchmark models to predict the Water Quality Index (WQI) values of the polluted stream to achieve our objectives. For model calibration, 80% of the dataset was reserved for training, while 20% was set aside for testing. In our comparative analyses of predictive models for water quality index, the Gradient Boost (GB) model stood out the fittest for its precision, utilizing a combination of just seven parameters (chemical oxygen demand, total organic carbon, oil & grease, Ammonia- nitrogen, arsenic, nickel and zinc), surpassing other models by achieving better results in both training (R 2 = 0.88, RMSE = 7.24) and testing (R 2 = 0.85, RMSE = 8.67). Analyzing feature importance showed that all the selected variables, except for NO 3 N, TDS and temperature had an impact on the accuracy of the models predictions. It is concluded that the application of machine learning to assess water quality in polluted environments enhances accuracy and facilitates real-time tracking, enabling proactive risk mitigations.
environmental sciences,green & sustainable science & technology,engineering, environmental