Impact of sampling for landslide susceptibility assessment using interpretable machine learning models
Bin Wu,Zhenming Shi,Hongchao Zheng,Ming Peng,Shaoqiang Meng
DOI: https://doi.org/10.1007/s10064-024-03980-8
IF: 4.2
2024-10-27
Bulletin of Engineering Geology and the Environment
Abstract:Landslide susceptibility assessment has made significant strides in meeting the urgent requirements for disaster prevention and mitigation. However, the inherent imbalance in landslide distributions poses challenges and thus various sampling strategies emerge. Yet, these strategies alter the original dataset distribution, necessitating a deeper understanding of their impact on susceptibility mapping. This study integrates multi-source information, including morphological, geological, hydrological, and land-use data in the northwest of Oregon State, to train four models—Decision Trees, Random Forest, Adaboost, and Gradient Tree Boosting —using both balanced and imbalanced training sets. Results reveal that models trained on imbalanced datasets generally exhibit superior classification performance. Models using balanced datasets predict more positives (landslides) at higher susceptibility levels, while those applied imbalanced datasets classified more negatives at lower levels. By employing the Shapley Additive Explanations method, the consistency in model decision-making was established and identified the top five most influential factors: distance to roads, slope roughness, geological age, roughness, and elevation. Furthermore, the consequences of FN (False Negatives) and FP (False Positives) were discussed, concluding that FN may lead to loss of life, and FP may result from prediction inaccuracies, dataset incompleteness, and forthcoming landslides, hence allowing for a certain amount. It suggests that models with balanced datasets are preferable for minimizing the quantity of FN and effectively capturing landslides at high and very high susceptibility areas. The findings provide valuable insights into the impact of positives and negatives ratios on landslide susceptibility and offer support for optimizing dataset sampling.
engineering, environmental,geosciences, multidisciplinary, geological