Study on the influence of input variables on the supervised machine learning model for landslide susceptibility mapping

Peng Lai,Fei Guo,Xiaohu Huang,Dongwei Zhou,Li Wang,Guangfu Chen
DOI: https://doi.org/10.1007/s12665-024-11501-9
IF: 2.8
2024-03-12
Environmental Earth Sciences
Abstract:Supervised machine learning (ML) models are currently popular in landslide susceptibility mapping (LSM). However, the input variables of these models have some inherent limitations in terms of the lack of nonlinear relationship between the raw input variables and landslides, and the loss of a significant amount of information induced by the demand of the discretization of continuous environmental factors for the discrete and frequency ratio values input variables. Therefore, to address these issues, a new method of neighborhood frequency ratio for obtaining input variables was adopted in this paper. The present study compared the results of four input variables and seven supervised ML models under 28 conditions, with the use of ROC (receiver operating characteristic) curves as evaluation methods for the prediction results. The AUC (area under curve) values, ranging from 0.8223 to 0.9928, shows that the input variables are very important to the evaluation model. The experimental results were analyzed from the perspective of algorithm principles and data characteristics. The main conclusions are as follows: (1) for the non-tree models (i.e., models other than tree models), neighborhood frequency ratio of environmental factors should be used as the model inputs. (2) For tree models (i.e., decision trees and the decision tree based integrated models), the raw values of environmental factors can be used directly as the model inputs of the LSM model. (3) The decision tree based integrated models yielded better prediction results.
environmental sciences,water resources,geosciences, multidisciplinary
What problem does this paper attempt to address?