Abstract:Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross‐validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross‐validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non‐causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross‐validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non‐random and blocked cross‐validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross‐validation is nearly universally more appropriate than random cross‐validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross‐validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.

A note on the validity of cross-validation for evaluating autoregressive time series prediction

Generalised learning of time-series: Ornstein-Uhlenbeck processes

Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Estimating the prediction performance of spatial models via spatial k-fold cross validation

Cross-validatory model selection for Bayesian autoregressions with exogenous regressors

Cross-validation in nonparametric regression with outliers

Cross validation for uncertain autoregressive model

Approximate leave-future-out cross-validation for Bayesian time series models

Is K-fold cross validation the best model selection method for Machine Learning?

Cross-validation: what does it estimate and how well does it do it?

Learning dynamical systems from data: A simple cross-validation perspective, part III: Irregularly-Sampled Time Series

Backtest overfitting in the machine learning era: A comparison of out-of-sample testing methods in a synthetic controlled environment

Don't Waste Your Time: Early Stopping Cross-Validation

Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models

Cross-validation of component models: A critical look at current methods

On The Smoothness of Cross-Validation-Based Estimators Of Classifier Performance

Prediction and model evaluation for space-time data

Forecast Evaluation in Large Cross-Sections of Realized Volatility

Exploring the impact of spatial autocorrelation on optimistic bias in cross-validation and assessing the effectiveness of spatial cross-validation

Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

Selection of Uncertain Differential Equations Using Cross Validation