Probabilistic Machine Learning for Predicting Desiccation Cracks in Clayey Soils
Babak Jamhiri,Yongfu Xu,Mahdi Shadabfar,Susanga Costa
DOI: https://doi.org/10.1007/s10064-023-03366-2
IF: 4.2
2023-01-01
Bulletin of Engineering Geology and the Environment
Abstract:With frequent heatwaves and drought-downpour cycles, climate change gives rise to severe desiccation cracks. In this research, a probabilistic machine learning (ML) framework is developed to improve the deterministic models. Therefore, a complete set of data-driven soil and environment parameters, including initial water content (IWC), crack water content (CWC), final water content (FWC), soil layer thickness (SLT), temperature (Temp), and relative humidity (RH), is utilized as inputs to predict the crack surface ratio (CSR). Also, a comprehensive set of MLs, including an ensemble of regression trees (i.e., random forests [RF] and regression trees [RT]), gradient-boosted trees (viz. GBT and XGBT), support-vector machines (SVM), and artificial neural network-particle swarm optimization (ANN-PSO), is developed for predictions. Monte Carlo simulation (MCS) is then employed to insert uncertainties in the given models via shuffling and randomizing samples. Two sensitivity analyses, in particular input exclusion and partial dependence-individual conditional expectation plots, are further established to assess the prediction reliability. Results indicate that the performance ranking of developed MLs can be put as SVM > GBT > XGBT > ANN-PSO > RF > RT. However, according to the probabilistic modeling based on the MCS, GBTs are highly capable for predictions with the lowest errors and uncertainties. The performance order of the models in terms of the higher coefficient of determination and lower standard deviation is GBT > SVM > XGBT > RF > ANN-PSO > RT. The sensitivity analyses also categorized the parameter importance in the order of FWC > CWC > SLT > IWC > Temp > RH. These findings demonstrate the immense capabilities of probabilistic MLs under uncertainties by measuring prediction error variances and hence improving performance precision.