Debris-flow susceptibility assessment in Dongchuan using stacking ensemble learning including multiple heterogeneous learners with RFE for factor optimization
Kun Li,Junsan Zhao,Yilin Lin
DOI: https://doi.org/10.1007/s11069-023-06099-3
IF: 3.158
2023-08-09
Natural Hazards
Abstract:An accurate assessment of debris-flow susceptibility is of great importance to the prevention and control of debris-flow disasters in mountainous areas. In this study, by applying the recursive feature elimination-random forest (RFE-RF) and the stacking ensemble learning including multiple heterogeneous learners, the high accuracy of the debris-flow susceptibility is assessed. The study area is determined in the Dongchuan District, Kunming City, Yunnan Province, China, where the debris-flows are prone to occur. By taking the grid unit as the assessment unit, 22 debris-flow hazard factors are preliminarily selected from multiple data sources, such as geology, topography, and precipitation, in accordance with the interpretation of debris-flow points. Next, total 16 factors are selected to construct the hazard factor system with the RFE-RF method, contribution rate, and Pearson correlation analysis for the primary factors. Finally, the base learners of the ensemble model are selected using accuracy and diversity metrics. In addition, the debris-flow susceptibility assessment of stacking ensemble learning, that multiplies the advantages and differences of different learners, is constructed, aiming at quantitatively analyzing the susceptibility of debris-flow in the study area. The natural breakpoint model is selected to classify the five levels for each grid unit. As for the prediction performance of the stacking ensemble learning including multiple heterogeneous learners, comparisons are conducted with the four base learner methods of support vector machine , back propagation neural network , extreme gradient boosting tree, and random forest (RF), as well as the four ensemble strategies of simple average , weighted average , weighted vote , and blending, respectively. As indicated by the results, the very low and low susceptibility zones of debris-flow are mainly concentrated in the eastern and western parts in the study area. The very high and high susceptibility zones are mainly distributed on the two banks of Xiaojiang River Valley and the south bank of Jinsha River, where there is fragile geological environment and high risk, in the study area. The medium susceptibility zone is mainly distributed around the very high and high susceptibility zones. There are excellent accuracy and stability in the stacking ensemble learning model of debris-flow susceptibility in the mountainous areas, when combining with the RFE-RF model and the diversity measurement. As for the stacking ensemble learning therein, the area under curve value of the receiver-operating characteristic, the accuracy value, and F1 score are the maximum, reaching 95.6%, 88.6%, and 88.9%, respectively. Besides, the root mean square error value is the minimum, namely 0.287, which indicates that stacking ensemble learning including multiple heterogeneous learners is a high-performance model for debris-flow susceptibility assessment. The findings can provide a scientific basis for the disaster prevention and mitigation in the mountainous areas.
geosciences, multidisciplinary,water resources,meteorology & atmospheric sciences