A Feature Extraction Method Based on the Entropy-Minimal Description Length Principle and GBDT for Common Surface Water Pollution Identification

Pingjie Huang,Lixiang Wang,Dibo Hou,Wangli Lin,Jie Yu,Guangxin Zhang,Hongjian Zhang
DOI: https://doi.org/10.2166/hydro.2021.060
IF: 3.058
2021-01-01
Journal of Hydroinformatics
Abstract:To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to water pollution identification, or to obtain accurate water pollution source information. Thus, this study proposed a feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree (GBDT) algorithm for identifying the type of surface water pollution in consideration of the distribution characteristics and intrinsic association of conventional water quality indicators. To improve the robustness to noise, we constructed the coarse-grained discretization features of each water quality index based on information entropy. The nonlinear correlation between water quality indexes and pollution classes was excavated by the GBDT algorithm, which was utilized to acquire tree transformed features. Water samples collected by a southern city Environmental Monitoring Center were used to test the performance of the proposed algorithm. Experimental results demonstrate that features extracted by the proposed method are more effective than the water quality indicators without feature engineering and features extracted by the principal component analysis algorithm.
What problem does this paper attempt to address?