Prediction of flood risk levels of urban flooded points though using machine learning with unbalanced data

Hongfa Wang,Yu Meng,Hongshi Xu,Huiliang Wang,Xinjian Guan,Yuan Liu,Meng Liu,Zening Wu
DOI: https://doi.org/10.1016/j.jhydrol.2024.130742
IF: 6.4
2024-01-28
Journal of Hydrology
Abstract:With the emphasis on preventing urban flooding and the enhancement of rational urban development, data related to urban flooding are also collected with unbalanced sample size that is a widespread phenomenon in other world fields. The performance of the classification model is compromised by unbalanced datasets, therefore, minority-class samples, floods with higher risk, are often missing alerted or incorrectly warned. To solve this problem, a novel hybrid resampling proposal is proposed in this research proved to be effective for balancing data. First, it optimizes an imbalanced dataset by the Borderline-SMOTE algorithm. Next, alternative datasets are synthesized through under-sampling techniques, whose qualities are evaluated by using information entropy and calculated rely on the k-nearest neighbor entropy estimator. The suggested method not only makes full use of the original data information, but also avoids under-fitting due to the single under-sampling utilization. A practical application in the central area of Zhengzhou, China, combining the resampling proposal and the Random Forest classification model optimized by Genetic Algorithm, the results show that significantly better results are yielded compared without any treatment in terms of all assessment indicators ( Accuracy , Recall , G-mean and F1-score ) have been improved.
geosciences, multidisciplinary,water resources,engineering, civil
What problem does this paper attempt to address?