Developing automated machine learning approach for fast and robust crop yield prediction using a fusion of remote sensing, soil, and weather dataset

Ahmed Kheir,Vinay Nangia,Abdelrazek Elnashar,Mina Devakota,Mohie Omar,Til Feike,Ajit Govind
DOI: https://doi.org/10.1088/2515-7620/ad2d02
2024-02-28
Environmental Research Communications
Abstract:Estimating smallholder crop yields robustly and timely is crucial for improving agronomic practices, determining yield gaps, guiding investment, and policymaking to ensure food security. However, there is poor estimation of yield for most smallholders due to lack of technology, and field scale data, particularly in Egypt. Automated machine learning (AutoML) can be used to automate the machine learning workflow, including automatic training and optimization of multiple models within a user-specified time frame, but it has less attention so far. Here, we combined extensive field survey yield across wheat cultivated area in Egypt with diverse dataset of remote sensing, soil, and weather to predict field-level wheat yield using 22 ML models in AutoML. The models showed robust accuracies for yield predictions, recording Willmott degree of agreement, (d > 0.80) with higher accuracy when super learner (stacked ensemble) was used (R2 = 0.51, d = 0.82). The trained AutoML was deployed to predict yield using remote sensing (RS) vegetative indices (VIs), demonstrating a good correlation with actual yield (R2 = 0.7). This is very important since it is considered a low-cost tool and could be used to explore early yield predictions. Since climate change has negative impacts on agricultural production and food security with some uncertainties, AutoML was deployed to predict wheat yield under recent climate scenarios from the Coupled Model Intercomparison Project Phase 6 (CMIP6). These scenarios included single downscaled General Circulation Model (GCM) as CanESM5 and two shared socioeconomic pathways (SSPs) as SSP2-4.5and SSP5-8.5during the mid-term period (2050). The stacked ensemble model displayed declines in yield of 21 % and 5 % under SSP5-8.5 and SSP2-4.5 respectively during mid-century, with higher. The developed approach could be used as a rapid, accurate and low-cost method to predict yield for stakeholder farms all over the world where ground data is scarce.
environmental sciences
What problem does this paper attempt to address?