Estimation on Total Phosphorus of Agriculture Soil in China: a New Sight with Comparison of Model Learning Methods

Chen Ying,Jia Jiepeng,Wu Caicong,Ramirez-Granada Lina,Li Gang
DOI: https://doi.org/10.1007/s11368-022-03374-x
IF: 3.5361
2022-01-01
Journal of Soils and Sediments
Abstract:Although soil total phosphorus (TP) is a primary and essential large element reflecting the soil fertility in agricultural ecosystems, studies on model development of TP and its differences between wheat and paddy lands after a long cultivation history at a regional scale are still limited. Hence, a comparison model of TP with different learning methods and datasets were built, and the relationship between environmental factors and TP were discussed. TP from a long cultivation of either wheat or paddy agriculture systems was investigated, and the regression between TP and climate parameters (air temperatures, precipitation, humidity, and atmospheric pressure) and latitude were analyzed. A comparison of model development with six learning methods, including one statistical learning method (linear) and five machine learning methods (support vector regression, decision tree, random forest, XGBoost, and LightGBM), and two datasets (0–20 and 0–170-cm soil layers) was made. The models were evaluated by the root mean squared error (RMSE), mean deviation (RMD), mean absolute error (MAE), and model effective (EF). The results showed that the TP content of the top soil layer in wheat lands (0.89 ± 0.01 g kg−1) was significantly higher than that of paddy lands (0.63 ± 0.01 g kg−1). The annual average precipitation, humidity, and air temperature had significant negative relationships with TP content, while the annual average atmospheric pressure and latitude had significant positive relationships with TP. Most machine learning methods showed better performances than that of a statistical learning method with the highest r2 of 0.82. The different datasets used for model development had no significant effect on model performances. The average TP content of the top soil layer tends to be greater in wheat lands than that in paddy lands after a long cultivation. Other than the different statistical parameters (the average, maximum, and minimum values) of each climate parameter, comprehensive climate parameters including the annual, semiannual, quarterly, and monthly air temperature, precipitation, humidity, and atmospheric pressure should be considered for further model development. Although different datasets in variable soil depth had no significant effect on model performances, machine learning methods such as random forest, XGBoost, and LightGBM are recommended for better performance than a linear learning method for soil TP model development. It is recommended that a comparison of different machine learning methods will help build a stronger model in similar studies.
What problem does this paper attempt to address?