Potential of Globally Distributed Topsoil Mid-Infrared Spectral Library for Organic Carbon Estimation
Yongsheng Hong,Jonathan Sanderman,Tomislav Hengl,Songchao Chen,Nan Wang,Jie Xue,Zhiqing Zhuo,Jie Peng,Shuo Li,Yiyun Chen,Yaolin Liu,Abdul Mounem Mouazen,Zhou Shi
DOI: https://doi.org/10.1016/j.catena.2023.107628
IF: 6.367
2024-01-01
CATENA
Abstract:Accurate monitoring of soil organic carbon (SOC) is critical for sustainable management of soil for improving its quality, function, and carbon sequestration. As a nondestructive, efficient, and low-cost technique, mid-infrared (MIR) spectroscopy has shown a great potential in rapid estimation of SOC, despite limited studies of the global scale. The objective of this work was to use a globally distributed topsoil MIR spectral library with 33,039 samples to predict SOC using different modeling methods. Effects of nine fractional-order derivatives (FODs) on the predicted accuracy of SOC were evaluated using four regression algorithms (i.e., ratio index-based linear regression, RI-LR; partial least squares regression, PLSR; Cubist; convolutional neural network, CNN). Square-root transformation to SOC data was performed to minimize the skewness and non-linearity. Results indicated FOD to capture the subtle spectral details related to SOC, leading to improved predictions that may not be possible by the raw absorbance and common integer-order derivatives. Concerning the RI-LR models, the optimal validation result for SOC was obtained by 0.75-order derivative, with the ratio of performance to inter-quartile distance (RPIQ) of 1.85. Regarding the full-spectrum modeling for SOC, the CNN outperformed PLSR and Cubist models, irrespective of raw absorbance or eight FODs; the best-performing CNN model was achieved by 1.25-order derivative (validation RPIQ = 6.33). It can be concluded that accurate estimation of SOC using large and diverse MIR spectral library at the global scale combined with deep-learning CNN model is feasible. This global-scale database is extremely valuable for us to deal with the shortage of soil data and to monitor the soils at different geographical scales.