Strategies for Efficient Estimation of Soil Organic Content at the Local Scale Based on a National Spectral Database

Hongyi Li,Yuheng Li,Meihua Yang,Songchao Chen,Zhou Shi
DOI: https://doi.org/10.1002/ldr.4223
2022-01-01
Land Degradation and Development
Abstract:Soil function degradation threatens the sustainable management of soil resources and soil organic matter (SOM) is a vital and important factor. Powerful measuring tools will become very important, especially in areas where data are poor or absent. The archive: China Soil Visible and Near Infrared (vis-NIR) Spectroscopy Library (CSSL) could help providea solution for less costly and fast measuring of SOM. The aim of this article was to compare SOM prediction performance according to three strategies: i) general global partial least squares regression (PLSR) using CSSL with and without spiking samples; ii) memory-based learning (MBL) using CSSL with and without spiking samples; and iii) general PLSR using only spiking samples to predict soil organic matter in the target area. When using spiked subsets, we also investigated the prediction performance of the extra-weighted (several copies) subsets. A series of spiking subsets were randomly selected from the total spiking samples, which were selected by conditioned Latin hypercube sampling (cLHS) from the target sites. We calculated only the mean squared Euclidean distance (msd) between the estimates density function (pds) of the principal components (PCs) of vis-NIR spectroscopy from the validation dataset and spiking subsets and statistically inferred the optimal sampling set size to be 30. Our study showed that global PLSR using CSSL spiked with the statistically optimal local samples can achieve higher predicted performance [with a mean root mean square error (RMSE) of 5.75]. MBL spiked with five extra-weighted optimal spiking samples achieved the best accuracy with an RMSE of 3.98, an R-2 of 0.70, a bias of 0.04, and an LCCC of 0.81. The msd is a simple and effective method to determine an adequate spiking set size using only vis-NIR data. These accurate predictions demonstrated the usefulness of statistically representative spiking and MBL for advanced large soil spectral libraries for SOM determination, which is currently lacking at large soil spectral libraries in use.
What problem does this paper attempt to address?