Maize Yield and Nitrate Loss Prediction with Machine Learning Algorithms

Mohsen Shahhosseini,Rafael A. Martinez-Feria,Guiping Hu,Sotirios V. Archontoulis
DOI: https://doi.org/10.1088/1748-9326/ab5268
2020-11-07
Abstract:Pre-season prediction of crop production outcomes such as grain yields and N losses can provide insights to stakeholders when making decisions. Simulation models can assist in scenario planning, but their use is limited because of data requirements and long run times. Thus, there is a need for more computationally expedient approaches to scale up predictions. We evaluated the potential of five machine learning (ML) algorithms as meta-models for a cropping systems simulator (APSIM) to inform future decision-support tool development. We asked: 1) How well do ML meta-models predict maize yield and N losses using pre-season information? 2) How many data are needed to train ML algorithms to achieve acceptable predictions?; 3) Which input data variables are most important for accurate prediction?; and 4) Do ensembles of ML meta-models improve prediction? The simulated dataset included more than 3 million genotype, environment and management scenarios. Random forests most accurately predicted maize yield and N loss at planting time, with a RRMSE of 14% and 55%, respectively. ML meta-models reasonably reproduced simulated maize yields but not N loss. They also differed in their sensitivities to the size of the training dataset. Across all ML models, yield prediction error decreased by 10-40% as the training dataset increased from 0.5 to 1.8 million data points, whereas N loss prediction error showed no consistent pattern. ML models also differed in their sensitivities to input variables. Averaged across all ML models, weather conditions, soil properties, management information and initial conditions were roughly equally important when predicting yields. Modest prediction improvements resulted from ML ensembles. These results can help accelerate progress in coupling simulation models and ML toward developing dynamic decision support tools for pre-season management.
Other Quantitative Biology,Machine Learning,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use machine - learning algorithms to predict maize yield and nitrate loss before the crop - growing season (i.e., before planting). Specifically, the researchers evaluated four machine - learning algorithms (LASSO regression, ridge regression, random forest, extreme gradient boosting and their ensembles) as metamodels of crop - system simulators (such as APSIM) to assist in the development of future decision - support tools. The paper mainly explores the following issues: 1. **How do machine - learning metamodels perform in predicting maize yield and nitrogen loss using pre - planting information?** 2. **How much data is required to train machine - learning algorithms to achieve acceptable prediction accuracy?** 3. **Which input data variables are the most important for accurate prediction?** 4. **Can the integration of machine - learning metamodels improve prediction performance?** Through the exploration of these issues, the research aims to provide a more efficient and dynamic decision - support system, so that farmers and agronomists can make better management decisions before planting. This will not only help to increase crop yields, but also reduce environmental impacts, such as nitrogen loss.