Comparative Analysis of Seven Machine Learning Algorithms and Five Empirical Models to Estimate Soil Thermal Conductivity
Tianyue Zhao,Shuchao Liu,Jia Xu,Hailong He,Dong Wang,Robert Horton,Gang Liu
DOI: https://doi.org/10.1016/j.agrformet.2022.109080
IF: 6.2
2022-01-01
Agricultural and Forest Meteorology
Abstract:Soil thermal conductivity (lambda) is an important thermal property that is crucial for surface energy balance and water balance studies. 1602 measured soil thermal conductivity values representing 189 soils were used to evaluate five empirical models (i.e., de Vries (1963) model (de Vries 1963), Campbell (1985) model (Camp-bell1985), Johansen (1975) model (Johansen 1975), Cote & PRIME; and Konrad (2005) model (Cote and Konrad 2005), and Lu et al. (2007) model (Lu 2007)) and seven machine learning (ML) algorithms (i.e., Decision Tree (DT), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Linear Regression (LR), K-Nearest Neighbors (KNN), Neural Network (NN), and Gaussian Process (GP)) to estimate lambda. Our results demonstrated that the average root mean squared error (RMSE) values of ML were 66% and 82% of the empirical model values on validation and test sets respectively. The three best ML algorithms (GBDT, NN, RF) performed significantly better than the three best empirical models (Lu 2007, Cote and Konrad 2005, Johansen 1975): 0.183 < RMSE < 0.259 (W m(-1) K-1) for ML algorithms and 0.293 < RMSE < 0.320 (W m(-1) K-1) for empirical models. For ML, we recommend the GBDT, NN and RF algorithms. For empirical models, we recommend to use three normalized models (Lu 2007, Cote & PRIME; and Konrad 2005, Johansen 1975) over the physically-based model (DV1963) and the regression model (CG1985). The feature importance rankings performed by the RF and GBDT algorithms show that soil moisture content and soil bulk density are the most critical factors affecting lambda. Soil moisture content and soil bulk density together account for more than 80% of the influence importance value of lambda. RF gives more consistent feature importance ranking results than GBDT, therefore, we recommend the use of RF for selecting features.