Combining Models is More Likely to Give Better Predictions Than Single Models.
Xiaoping Hu,Laurence V Madden,Simon Edwards,Xiangming Xu
DOI: https://doi.org/10.1094/phyto-11-14-0315-r
2015-01-01
Phytopathology
Abstract:In agricultural research, it is often difficult to construct a single "best" predictive model based on data collected under field conditions. We studied the relative prediction performance of combining empirical linear models over the single best model in relation to number of models to be combined, number of variates in the models, magnitude of residual errors, and weighting schemes. Two scenarios were simulated: the modeler did or did not know the relative of performance of the models to be combined. For the former case, model averaging is achieved either through weights based on the Akaike Information Criterion (AIC) statistic or with arithmetic averaging; for the latter case, only the arithmetic averaging is possible (because the relative model predictive performance is not known for a common dataset). In addition to two experimental datasets on oat mycotoxins in relation to environmental variables, two datasets were generated assuming a consistent correlation structure among explanatory variates with two magnitudes of residual errors. For the majority of cases, model averaging resulted in improved prediction performance over the single-model predictions, especially when a modeler does not have the information of relative model performance. The fewer variates in the models to be combined, the greater is improvement of model averaging over the single-model predictions. Combining models led to very little improvement over individual models when there were many variates in individual models. Overall, simple arithmetic averaging resulted in slightly better performance than the AIC-based weighted averaging. The advantage in model averaging is also noticeable for larger residual errors. This study suggests that model averaging generally performs better than single-model predictions, especially when a modeler does not have information on the relative performance of the candidate models.