Methodology for Regional Soil Organic Matter Prediction with Spectroscopy: Optimal Sample Grouping, Input Variables, and Prediction Model

Xinle Zhang,Chang Dong,Huanjun Liu,Xiangtian Meng,Chong Luo,Yongqi Han,Hongfu Ai
DOI: https://doi.org/10.3390/rs16030565
IF: 5
2024-02-01
Remote Sensing
Abstract:Soil organic matter (SOM) is an essential component of soil and is crucial for increasing agricultural production and soil fertility. The combination of hyperspectral remote sensing and deep learning can be used to predict the SOM content efficiently, rapidly, and cost-effectively on various scales. However, determining the optimal groups, inputs, and models for reducing the spatial heterogeneity of soil nutrients in large regions and to improve the accuracy of SOM prediction remains a challenge. Hyperspectral reflectance data from 1477 surface soil samples in Northeast China were utilized to evaluate three grouping methods (no groups (NG), traditional grouping (TG), and spectral grouping (SG)) and four inputs (raw reflectance (RR), continuum removal (CR), fractional-order differentiation (FOD), and spectral characteristic parameters (SCPs)). The SOM prediction accuracies of random forest (RF), convolutional neural network (CNN), and long short-term memory (LSTM) models were assessed. The results were as follows: (1) The highest accuracy was achieved using SG, SCPs, and the LSTM model, with a coefficient of determination (R2) of 0.82 and a root mean squared error (RMSE) of 0.69%. (2) The LSTM model exhibited the highest accuracy in SOM prediction (R2 = 0.82, RMSE = 0.89%), followed by the CNN model (R2 = 0.72, RMSE = 0.85%) and the RF model (R2 = 0.69, RMSE = 0.91%). (3) The SG provided higher SOM prediction accuracy than TG and NG. (4) The SCP-based prediction results were significantly better than those of the other inputs. The R2 of the SCP-based model was 0.27 higher and the RMSE was 0.40% lower than that of the RR-based model with NG. In addition, the LSTM model had higher prediction errors at low (0–2%) and high (8–10%) SOM contents, whereas the error was minimal at intermediate SOM contents (2–8%). The study results provide guidance for selecting grouping methods and approaches to improve the prediction accuracy of the SOM content and reduce the spatial heterogeneity of the SOM content in large regions.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of how to efficiently, quickly, and cost-effectively predict soil organic matter (SOM) content over large areas using spectroscopic methods. Specifically, the paper focuses on the following aspects: 1. **Optimal Sample Grouping**: How to reasonably group soil samples to reduce spatial heterogeneity of soil nutrients over large areas. 2. **Input Variable Selection**: Which preprocessing methods and input variables can improve the accuracy of SOM prediction. 3. **Prediction Model Optimization**: Which machine learning or deep learning model performs best in SOM prediction. ### Background and Challenges - **Importance of Soil Organic Matter**: SOM is a key indicator of soil quality, fertility, and resource evaluation. - **Limitations of Traditional Methods**: Traditional soil analysis methods are time-consuming, labor-intensive, and costly. - **Advantages of Spectroscopy**: Hyperspectral remote sensing technology can quickly, accurately, quantitatively, and cost-effectively predict SOM content. - **Spatial Heterogeneity**: Different soil types and environmental conditions in various regions lead to spatial heterogeneity in SOM distribution, posing challenges for prediction. ### Research Methods - **Data Source**: Hyperspectral reflectance data from 1477 surface soil samples in the northeastern region. - **Grouping Methods**: Three grouping methods were evaluated (no grouping (NG), traditional grouping (TG), and spectral grouping (SG)). - **Input Variables**: Four input variables were evaluated (raw reflectance (RR), continuum removal (CR), fractional order derivative (FOD), and spectral characteristic parameters (SCPs)). - **Prediction Models**: The predictive performance of three models was evaluated: random forest (RF), convolutional neural network (CNN), and long short-term memory network (LSTM). ### Main Results 1. **Best Combination**: The combination of spectral grouping (SG), spectral characteristic parameters (SCPs), and the LSTM model achieved the highest prediction accuracy, with a coefficient of determination (R²) of 0.82 and a root mean square error (RMSE) of 0.69%. 2. **Model Performance**: The LSTM model performed best in SOM prediction (R² = 0.82, RMSE = 0.89%), followed by the CNN model (R² = 0.72, RMSE = 0.85%) and the RF model (R² = 0.69, RMSE = 0.91%). 3. **Impact of Grouping Methods**: Spectral grouping (SG) provided higher SOM prediction accuracy than traditional grouping (TG) and no grouping (NG). 4. **Impact of Input Variables**: Predictions based on spectral characteristic parameters (SCPs) were significantly better than those based on other input variables. ### Conclusion This study provides guidance on selecting appropriate grouping methods and improving SOM content prediction models, helping to enhance the accuracy and reliability of SOM content prediction over large areas. It offers valuable references for large-scale farmland soil fertility assessment and agricultural production.