Machine-Learning-Based Approach to Assessing Water Quality in a Specific Basin: the Case of Wujingang Basin

Shubo Zhang,Ruonan He,Qian Wang,Zhan Qu,Jinfeng Wang,Yanru Wang,Hongqiang Ren
DOI: https://doi.org/10.1021/acsestwater.3c00153
2023-01-01
ACS ES&T Water
Abstract:Multidimensional indicators of surface water are key to assessing the water quality. Cost and time could be saved if surface water can be accurately assessed by fewer indicators. Therefore, it is necessary to screen key water quality indicators for different basins. This study collected 35 water quality indicators (42 315 observations) along the Wujingang basin. Cluster analysis and correlation coefficients were used to identify homogeneous categories of water quality indicators. Frequent pattern mining (FPM) was used to remove redundant indicators. Finally, the water quality assessment after the removal of redundant indicators was validated by classification analysis. Results of the silhouette coefficient and within-cluster sum of squared errors indicated that K-means was the optimal clustering model. Concomitant indicators Pb, Cl-, NO3-, TN, V, and Al were identified using FPM. Decision tree verified that the performance did not decrease after removing Pb, Cl-, V, NO3--N, and Al. These indicators were redundant for the Wujingang basin and could be monitored less frequently when there is no special use. This study provides important information for developing a selection framework based on multidimensional water quality data, which could serve as a baseline system for the selection of key water quality indicators in a specific basin.
What problem does this paper attempt to address?