Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost)

Hancheng Ren,Bo Pang,Ping Bai,Gang Zhao,Shu Liu,Yuanyuan Liu,Min Li
DOI: https://doi.org/10.3390/rs16020320
IF: 5
2024-01-13
Remote Sensing
Abstract:Due to the complex interaction of urban and mountainous floods, assessing flood susceptibility in mountainous urban areas presents a challenging task in environmental research and risk analysis. Data-driven machine learning methods can evaluate flood susceptibility in mountainous urban areas lacking essential hydrological data, utilizing remote sensing data and limited historical inundation records. In this study, two ensemble learning algorithms, Random Forest (RF) and XGBoost, were adopted to assess the flood susceptibility of Kunming, a typical mountainous urban area prone to severe flood disasters. A flood inventory was created using flood observations from 2018 to 2022. The spatial database included 10 explanatory factors, encompassing climatic, geomorphic, and anthropogenic factors. Artificial Neural Network (ANN) and Support Vector Machine (SVM) were selected for model comparison. To minimize the influence of expert opinions on model training, this study employed a strategy of uniformly random sampling in historically non-flooded areas for negative sample selection. The results demonstrated that (1) ensemble learning algorithms offer higher accuracy than other machine learning methods, with RF achieving the highest accuracy, evidenced by an area under the curve (AUC) of 0.87, followed by XGBoost at 0.84, surpassing both ANN (0.83) and SVM (0.82); (2) the interpretability of ensemble learning highlighted the differences in the potential distribution of the training data's positive and negative samples. Feature importance in ensemble learning can be utilized to minimize human bias in the collection of flooded-site samples, more targeted flood susceptibility maps of the study area's road network were obtained; and (3) ensemble learning algorithms exhibited greater stability and robustness in datasets with varied negative samples, as evidenced by their performance in F1-Score, Kappa, and AUC metrics. This paper further substantiates the superiority of ensemble learning in flood susceptibility assessment tasks from the perspectives of accuracy, interpretability, and robustness, enhances the understanding of the impact of negative samples on such assessments, and optimizes the specific process for urban flood susceptibility assessment using data-driven methods.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The paper attempts to address the issue of assessing flood susceptibility in mountainous urban areas. Specifically, the paper focuses on how to use remote sensing data and limited historical inundation records, through machine learning methods (particularly ensemble learning algorithms), to evaluate flood susceptibility in the absence of basic hydrological data. The paper mainly explores the following points: 1. **Improving assessment accuracy**: By comparing the performance of ensemble learning algorithms such as Random Forest (RF) and XGBoost with other traditional machine learning methods (such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM)), the paper verifies the advantages of ensemble learning algorithms in flood susceptibility assessment. 2. **Reducing the influence of expert opinions**: A strategy of random uniform sampling is adopted to select negative samples from historically non-inundated areas, in order to reduce the impact of subjective judgment on model training, thereby improving the objectivity and generalization ability of the model. 3. **Enhancing model interpretability and stability**: Through feature importance analysis, the potential distribution differences between positive and negative samples are understood, the training dataset is optimized, more accurate flood susceptibility maps are generated, and the stability and robustness of ensemble learning algorithms under different negative sample ratios are verified. Overall, the paper aims to optimize the flood susceptibility assessment process in mountainous urban areas through a data-driven approach, improving the accuracy and reliability of the assessment.