Comparative study of sampling strategies for machine learning-based landslide susceptibility assessment
Xiao-Dong Liu,Ting Xiao,Shao-He Zhang,Ping-He Sun,Lei-Lei Liu,Zu-Wu Peng
DOI: https://doi.org/10.1007/s00477-024-02841-w
IF: 3.821
2024-11-20
Stochastic Environmental Research and Risk Assessment
Abstract:To explore the influence of different sampling strategies on landslide susceptibility assessment, this study takes landslides in Taojiang County, Hunan Province, China, as the research object. Based on three positive sample sampling methods (discrete point sampling, buffer integral sampling, and buffer discrete sampling) and three sampling resolutions (30 m, 60 m, and 90 m cell size), nine sampling strategies were combined. Extreme gradient boosting, random forest, and convolutional neural network machine learning algorithms were used to establish landslide susceptibility models under different sampling strategies, and the performance of each model was evaluated using the Receiver Operating Characteristic (ROC) curve. The results show that the model accuracy is highest when using the buffer discrete sampling strategy, while the accuracy is lowest under the buffer integral sampling strategy. The performance of various sampling strategies fluctuates greatly at the 30 m cell size, while the model exhibits higher stability at the 90 m cell size. The results indicate that different sampling strategies have a significant impact on the performance of landslide susceptibility assessment models, and some sampling strategies exhibit overfitting and false segmentation phenomena. The area under the ROC curve (AUC) cannot fully reflect the quality of the model. Although the AUC of the buffer integral sampling strategy is not high, it avoids overfitting and false segmentation problems, making its results more reliable.
environmental sciences,engineering, environmental,water resources, civil,statistics & probability