Abstract:<p>Predictive performance is important to many applications of species distribution models (SDMs). The SDM 'ensemble' approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence–absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard‐pattern and latitudinal slicing). We calibrated and cross‐validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package 'biomod2', with standard ('untuned') settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver‐operating characteristic curve (AUC) and log‐likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross‐validation performance also yielded good external performance, with blocked cross‐validation proving better suited for this choice, in this study, than repeated random cross‐validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.</p>

A randomized permutation whole-model test heuristic for Self-Validated Ensemble Models (SVEM)

Non-parametric Tests for the Tail Equivalence Via Empirical Likelihood

Sequential Permutation Testing of Random Forest Variable Importance Measures

Testing for no effect in regression problems: a permutation approach

Generalized Permutation Framework for Testing Model Variable Significance

Permutation Tests for Assessing Potential Non-Linear Associations between Treatment Use and Multivariate Clinical Outcomes

Permutation-based multiple testing when fitting many generalized linear models

The Classification Permutation Test: A Nonparametric Test for Equality of Multivariate Distributions

A Goodness-of-fit Test for Parametric and Semi-Parametric Models in Multiresponse Regression

An Empirical Comparison of Parametric and Permutation Tests for Regression Analysis of Randomized Experiments

Functional Response Designs via the Analytic Permutation Test

A Permutation Test for Assessing the Presence of Individual Differences in Treatment Effects

Testing Predictor Significance with Ultra High Dimensional Multivariate Responses

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Testing against ordered alternatives in one-way ANOVA model with exponential errors

A Note on Standard Errors for Multidimensional Two-Parameter Logistic Models Using Gaussian Variational Estimation

How Many Ratings per Item are Necessary for Reliable Significance Testing?

A Computational Note on the Application of the Supplemented EM Algorithm to Item Response Models

Permutation tests for detecting and estimating mixtures in task performance within groups.

The InterModel Vigorish as a Lens for Understanding (and Quantifying) the Value of Item Response Models for Dichotomously Coded Items