Abstract:The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a "committee" of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1st. Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18–24 (May 1st to June 1st) are the most important input features.

Maize Yield and Nitrate Loss Prediction with Machine Learning Algorithms

Using machine learning for crop yield prediction in the past or the future

Assessing the uncertainty of maize yield without nitrogen fertilization

Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt

Simulating Maize Response to Split-Nitrogen Fertilization Using Easy-to-Collect Local Features

IMPLEMENTATION OF MACHINE LEARNING FOR PREDICTING MAIZE CROP YIELDS USING MULTIPLE LINEAR REGRESSION AND BACKWARD ELIMINATION

Machine Learning Approaches Can Reduce Environmental Data Requirements for Regional Yield Potential Simulation

Forecasting Corn Yield With Machine Learning Ensembles

Optimizing rice in-season nitrogen topdressing by coupling experimental and modeling data with machine learning algorithms.

Predicting in-season maize (Zea mays L.) yield potential using crop sensors and climatological data

Multi-Stage Corn Yield Prediction Using High-Resolution UAV Multispectral Data and Machine Learning Models

A neural meta model for predicting winter wheat crop yield

Modeling Long-Term Corn Yield Response to Nitrogen Rate and Crop Rotation

Comparing Machine Learning Techniques for Alfalfa Biomass Yield Prediction

Using Machine Learning Models to Predict Hydroponically Grown Lettuce Yield

Machine Learning in Evaluating Multispectral Active Canopy Sensor for Prediction of Corn Leaf Nitrogen Concentration and Yield

A data-driven crop model for maize yield prediction

A Prediction Model of Maize Field Yield Based on the Fusion of Multitemporal and Multimodal UAV Data: A Case Study in Northeast China

Integrating processed-based models and machine learning for crop yield prediction

Multi-omics assists genomic prediction of maize yield with machine learning approaches