Landslide Hazard Risk Mapping Based on Random Forest Classification in Ruijin, Jiangxi, China

Zhou Xiaoting,Weicheng Wu,Ziyu Lin,Guiliang Zhang,Renxiang Chen,Song Yong,Wang Zhiling,Lang Tao,Ou Penghui,Huangfu Wenchao,Zhang Yang,Xie Lifeng,Xiaolan Huang,Yaozu Qin,Shanling Peng,Shao Chongjian
DOI: https://doi.org/10.5194/egusphere-egu2020-8278
2020-01-01
Abstract:Landslides are common geological hazards that not only affect the normal road traffic but also pose a great threat and damage to human lives and properties. This study aims to conduct such a hazard risk mapping using Random Forest Classification (RFC) approach taking Ruijin County in Jiangxi, China as an example. Multi-source data namely terrain (DEM, slope and aspect), precipitation, the normalized difference vegetation index (NDVI) representing vegetation condition and abundance, strata and their lithology, distance to roads, distance to rivers, distance to faults, thickness of weathering crust, soil type and texture, etc., were employed for this study. The non-numeric data such as geological strata, soil units, faults, were spatialized and assigned values in terms of their susceptibility to landslide. Similarly, linear features such as roads, rivers and faults were buffered with distances of 0-30, 30-60, 60-90 and 90-120 m and each buffer zone was assigned a susceptibility value of landslide, e.g., zones 0-30, 30-60, 60-90 and 90-120 of road buffers were assigned respectively 10, 7, 4, and 1, meaning that the closer to the road, the higher risk of landslide. In total, 16 hazard factor layers were derived and converted into raster. 156 landslide hazards that have truly taken places (points) and been verified in field were used to create a training set (TS, 70% of total landslides) and a validation set (VS, 30%) by buffering-based rasterization procedure. A number of polygons were defined in places where landslide is unlikely to occur, e.g., water bodies, zero-slope plain, and urban areas. These polygons were added to the TS as non-risk area. Then, RFC was conducted to model the probability of landslide risk using these 16 factor layers as predictors and TS for training. The obtained RF model was applied back to the 16 factor layers to predict the probability of landslide risk at each pixel in the whole county. The prediction map was checked against the VS and found that the Overall Accuracy and Kappa Coefficient are respectively 92.18% and 0.8432, and the landslide-prone areas are mainly distributed on two sides of the roads. The results reveal that extremely high-risk zones with a probability of more than 0.9 take up 76.70 km2 in the county, and the distance to roads is the most important factor followed by precipitation among all factors causing landslides as road construction and housing development cut off slopes leading to instability of the weathered crust; and heavy rainfalls trigger the instability. Our study shows that the RFC prediction has high accuracy and in good consistency with field observation.
What problem does this paper attempt to address?