tidysdm: leveraging the flexibility of tidymodels for Species Distribution Modelling in R
Michela Leonardi,Margherita Colucci,Andrea Vittorio Pozzi,Eleanor M.L. Scerri,Andrea Manica
DOI: https://doi.org/10.1101/2023.07.24.550358
2024-06-28
Abstract:In species distribution modelling (SDM), it is common practice to explore multiple machine-learning algorithms and combine their results into ensembles. This is challenging in R: different algorithms were developed independently, with inconsistent syntax and data structures. Specialised SDM packages solve this problem by wrapping them into complex functions that tackle their specific requirements. But creating and maintaining such interfaces is time-consuming, and there is no way to easily integrate other methods that may become available. Here we present tidysdm, an R package that solves this problem by taking advantage of the tidymodels universe. Being part of the tidyverse, it (i) has standardised grammar, data structures and interface for modelling, (ii) includes packages designed for fitting, tuning, and validating various models, and (iii) allows easy integration of new algorithms and methods. tidysdm grants easy and flexible SDM by supporting standard algorithms, including additional SDM-oriented functions, and allowing the use of any procedure to fit, tune and validate different models. Additionally, it provides functions to easily fit models based on paleo/time-scattered data. tidysdm includes two vignettes detailing standard procedures for present-day and time-scattered data. Users can utilise any standard-format climatic data as input, but we also showcase the integration with the package pastclim, allowing easier access to present, past and future climate. An additional vignette illustrates how to leverage other tidyerse packages to enhance the workflow of tidysdm. Finally, a section on the website helps troubleshoot common problems with tidymodels.
Ecology