Analyses of the variable selection using correlation methods: An approach to the importance of statistical inferences in the modelling process
Mauricio Díaz-Vallejo,Alexander Peña-Peniche,Claudio Mota-Vargas,Javier Piña-Torres,Daniel Valencia-Rodríguez,Coral E. Rangel-Rivera,Juliana Gaviria-Hernández,Octavio Rojas-Soto
DOI: https://doi.org/10.1016/j.ecolmodel.2024.110893
IF: 3.1
2024-09-29
Ecological Modelling
Abstract:Selecting the best set of variables in ecological niche models (ENM) and species distribution models (SDM) has become a topic of interest in correlative models, leading to the use of statistical methods to estimate the relationships between variables. However, selecting sets of variables requires several decisions, such as choosing sources of information (i.e., species records and calibration areas) and statistical methods to optimize the modelling process while preventing the overestimation of parameters. In the present study, we analyzed four scenarios for selecting variables in ENM/SDM, including the implication of using the Pearson and Spearman correlation methods, with two strategies to extract the variables' information: species records and calibration areas. First, we conducted a bibliographic review to determine the most used methods to select variables. 134 of the 150 articles selected applied correlation methods, 47 used Pearson and 18 Spearman, and the remaining 69 did not specify the type of correlation method. Also, 19 articles employed species records, 20 used calibration areas, and 95 did not specify how they selected variables, showing the absence of clarity and consistency in variables selection. Then, we explored the same four combinations for 56 bird species. We conducted normality tests for the variables per species and found a tendency for non-normal distributions. Furthermore, we performed Pearson and Spearman correlations using species records and calibration areas as extraction strategies and discussed the differences between each one. Finally, we built different sets of variables and performance SDM for six species and found that the set of variables selected has a different composition based on their strategy. Our findings highlight the absence of clarity and consistency in describing correlation coefficients commonly used for environmental variable selection and emphasize its significant implications.
ecology