Corn Grain Yield Prediction Using UAV-Based High Spatiotemporal Resolution Imagery, Machine Learning, and Spatial Cross-Validation

Patrick Killeen,Iluju Kiringa,Tet Yeap,Paula Branco
DOI: https://doi.org/10.3390/rs16040683
IF: 5
2024-02-15
Remote Sensing
Abstract:Food demand is expected to rise significantly by 2050 due to the increase in population; additionally, receding water levels, climate change, and a decrease in the amount of available arable land will threaten food production. To address these challenges and increase food security, input cost reductions and yield optimization can be accomplished using yield precision maps created by machine learning models; however, without considering the spatial structure of the data, the precision map's accuracy evaluation assessment risks being over-optimistic, which may encourage poor decision making that can lead to negative economic impacts (e.g., lowered crop yields). In fact, most machine learning research involving spatial data, including the unmanned aerial vehicle (UAV) imagery-based yield prediction literature, ignore spatial structure and likely obtain over-optimistic results. The present work is a UAV imagery-based corn yield prediction study that analyzed the effects of image spatial and spectral resolution, image acquisition date, and model evaluation scheme on model performance. We used various spatial generalization evaluation methods, including spatial cross-validation (CV), to (a) identify over-optimistic models that overfit to the spatial structure found inside datasets and (b) estimate true model generalization performance. We compared and ranked the prediction power of 55 vegetation indices (VIs) and five spectral bands over a growing season. We gathered yield data and UAV-based multispectral (MS) and red-green-blue (RGB) imagery from a Canadian smart farm and trained random forest (RF) and linear regression (LR) models using 10-fold CV and spatial CV approaches. We found that imagery from the middle of the growing season produced the best results. RF and LR generally performed best with high and low spatial resolution data, respectively. MS imagery led to generally better performance than RGB imagery. Some of the best-performing VIs were simple ratio index(near-infrared and red-edge), normalized difference red-edge index, and normalized green index. We found that 10-fold CV coupled with spatial CV could be used to identify over-optimistic yield prediction models. When using high spatial resolution MS imagery, RF and LR obtained 0.81 and 0.56 correlation coefficient (CC), respectively, when using 10-fold CV, and obtained 0.39 and 0.41, respectively, when using a k-means-based spatial CV approach. Furthermore, when using only location features, RF and LR obtained an average CC of 1.00 and 0.49, respectively. This suggested that LR had better spatial generalizability than RF, and that RF was likely being over-optimistic and was overfitting to the spatial structure of the data.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The main problem this paper attempts to address is predicting corn yield using high spatial-temporal resolution imagery obtained from Unmanned Aerial Vehicles (UAVs), machine learning methods, and spatial cross-validation. Specifically, the study aims to: 1. **Reveal and avoid overly optimistic model performance**: Most existing crop yield prediction studies based on UAV imagery ignore the spatial structure of the data, leading to overly optimistic model evaluation results. This paper applies spatial cross-validation methods to identify models that overfit the spatial structure of the data and estimate the true generalization performance of the models. 2. **Determine the optimal shooting time**: The study analyzes the impact of capturing images at different times during the growing season on optimizing yield prediction results, with the goal of minimizing the number of UAV flight missions. 3. **Evaluate the best Vegetation Indices (VIs)**: The performance of various vegetation indices is compared to determine which indices are most suitable for corn yield prediction. 4. **Feasibility of low-cost cameras**: The study explores whether low-cost RGB cameras can be used instead of expensive multispectral (MS) cameras for yield prediction. 5. **Comparison of raw bands and vegetation indices**: The study investigates whether raw bands can be used directly for prediction instead of calculating vegetation indices, thereby simplifying the prediction process. 6. **Comparison of satellite imagery and UAV imagery**: The study evaluates whether similar prediction results can be achieved using lower-cost satellite imagery compared to UAV imagery. In summary, this research aims to improve the accuracy and reliability of corn yield prediction by enhancing model evaluation methods and optimizing data collection strategies.