XGBoost Based Machine Learning Techniques for Water Quality Prediction

E. Edwin,S. P. Roshan,M. Thanka,V. Ebenezer,Stewart Kirubakaran S,R. Joy
DOI: https://doi.org/10.1109/ICCPCT58313.2023.10244964
2023-08-10
Abstract:Water is important to all living beings and is essential to our daily lives. It is used for drinking, irrigation, and manufacturing. But, as globalisation increased, water lost its purity, and scientific procedures are now employed to test water for its quality. In this study, a prediction model was developed to quantify quality and state of the water using machine learning techniques. XGBoost, Naive Bayes, and Support Vector Machine (SVM) are water quality prediction methods. The dataset was initially obtained from Kaggle, and it was pre-processed to eliminate null values and outliers. The Water Quality Index (WQI) value was calculated using an arithmetic approach that employs index weights of parameters, and the water was labelled based on its WQI value. Before feeding the imbalanced dataset into the machine learning model for training and testing, the Synthetic Minority Over-sampling Technique (SMOTE) will be used to balance it with a WQI value and its class. The prediction will be conducted by feeding the most accurate parameters into the model, and the model's output will be the water quality classification and consumption.
Environmental Science,Computer Science
What problem does this paper attempt to address?