Flood susceptibility mapping in an arid region of Pakistan through ensemble machine learning model

Yaseen Andaleeb,Lu Jianzhong,Chen Xiaoling
DOI: https://doi.org/10.1007/s00477-022-02179-1
IF: 3.821
2022-01-01
Stochastic Environmental Research and Risk Assessment
Abstract:Floods are among the most destructive natural hazards. Therefore, their prediction is pivotal for flood management and public safety. Factors contributing to flood are different for every watershed as they depend upon the characteristics of each watershed. Therefore, this study evaluated the factors contributing to flood and the precise location of high and very high flood susceptibility regions in Karachi. A new ensemble model (LR-SVM-MLP) is introduced to develop the susceptibility map and evaluate influencing factors. This ensemble model was formed by employing a stacking ensemble on Logistic Regression (LR), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). A spatial database was generated for the Karachi watershed, which included; twelve conditioning factors as independent variables, 652 flood points and the same number of non-flood points as dependent variables. This data was then randomly divided into 70% and 30% to train and validate models, respectively. To analyse the collinearity among factors and to scrutinize each variable's predictive power, multicollinearity test and Information Gain Ratio were applied, respectively. After training, the models were evaluated on various statistical measures and compared with benchmark models. Results revealed that the proposed ensemble model outperformed Logistic Regression (LR), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) and produced a precise and accurate map. Results of ensemble model showed 99% accuracy in training and 98% accuracy in testing datasets. This ensemble model can be used by flood management authorities and the government to contribute to future research studies.
What problem does this paper attempt to address?