Characterization and classification of fine-resolution soil profile for precision agriculture using random forest and self-organizing map

Ani A. Elias,Megha Sharma,Shailendra Goel
DOI: https://doi.org/10.1101/2024.04.02.587707
2024-04-03
Abstract:The availability of high throughput soil profile information is an important component in precision agriculture to perform efficient soil management for sustainable production. We collected 14 soil physiochemical features from Nagpur, Pune, and Haveri, representing target environments of safflower cultivation and also from our experiment station at Delhi, at fine resolution and created graphical maps to depict the variability. Additionally, we evaluated the predictive ability of two statistical learning models, random forest (RF) and self-organizing maps (SOM) against multinomial regression models for correctly classifying the soil profile. Clustering was performed around the medoids produced from the dissimilarity matrices of these models using partitioning around medoids (PAM) model. The robustness, versatility, and predictive ability of models in correctly classifying the soil profile to clusters were then tested using cross-validation which was repeated 100 times. This study was performed using training data with proportionate size varying from 60 to 95%, and increasing the unit area of observation up to nine times (or decreasing the total number of observations up to a ninth). RF model was found to be the best performing with average prediction accuracy above 85% in all settings which reached close to 100% in some settings. The predictive ability of all the models was maintained even when only the most influencing six variables were used for classification. The optimal training population size for prediction was found to be 70 – 80%. Based on our study, it is recommended to i) collect fine resolution edaphic features from a marginal farm before crop season, ii) use RF or SOM model to identify the most influencing features distinguishing the soil samples iii) expand the area of sample collection, find values for the most influencing features, and use RF model to correctly predict the class to which the new set of the soil belongs to.
Plant Biology
What problem does this paper attempt to address?
The main aim of this paper is to address the following issues: 1. **Creating Soil Maps**: Researchers conducted fine-resolution soil profile sampling in different regions of India to create graphical maps that reflect changes in soil physical and chemical characteristics. 2. **Soil Sample Classification**: Using the collected data to classify soil samples from specific target environments (TE) suitable for growing safflower. Specifically, soil samples were classified using two statistical learning models: Random Forest (RF) and Self-Organizing Maps (SOM). 3. **Evaluating Model Predictive Ability**: Comparing the performance of Random Forest, Self-Organizing Maps, and multiple regression models in correctly classifying soil samples and determining the best model. Additionally, the predictive ability of the models was evaluated under different training data proportions and different observation areas. 4. **Identifying Key Influencing Factors**: Identifying the most influential factors for soil sample classification. For example, components such as phosphorus, potassium, sodium, calcium, carbon, and sand were found to have significant impacts on classification. 5. **Optimizing Training Data Volume**: Investigating the amount of training data required to achieve optimal predictive accuracy, with results indicating that 70%-80% of the training data volume is the best choice. In summary, the focus of this study is on developing a methodological framework to support soil management practices in precision agriculture. By acquiring and analyzing high-throughput soil information, this research contributes to achieving the goal of sustainable agricultural production.