Species Distribution, Abundance and Survival Modeling: New Opportunities and Methods
I.V. Karyakin,K.I. Knizhov,,,
DOI: https://doi.org/10.19074/1814-8654-2023-2-347-357
2023-01-01
Raptors Conservation
Abstract:Many large raptor species are currently rare and most of them are endangered, and thus details of their distribution, abundance, and survival are the most important indicators for planning conservation and restoration measures and assessing the impacts of anthropogenic transformation of the environment and/or climate change on the populations of these species. Abundance and spatial distribution of the birds under study are determined during field surveys. At the result, we obtain the distribution density in individuals, pairs, nests per unit area (for example, pairs/100 km2), or the distance between nearest or all neighbors (represented as mathematical values (1–5, on average 3.5±1.1 km) and/ or in graphical form (ranging from simple lines connecting observation points to Delaunay triangulation and a network of polygons built from observation points). Further, to generate an estimate of abundance, one must understand the area over which these data can be extrapolated. This is often challenging for many researchers – incorrect assessment of the area of the species’ habitat distorts estimated abundance and neutralizes censusing efforts. How can one correctly determine the area, over which it is possible to extrapolate censusing data? The answer to this question can be found by modeling in a GIS environment using geographic layers of environmental and spatial information, or, in current terminology, species distribution modeling (SDM). When using SDM (also known as habitat or species range modeling), environmental data (climatic and spatial variables such as temperature, humidity, wind load, topography, land cover, soils, etc. – predictor or independent variables) are calculated for geographically referenced points of a species’ presence (dependent variable) and species distribution is predicted using computer algorithms and mathematical methods. SDM is carried out in six stages: (1) idea conceptualization, (2) data preparation (presence and absence points or background points), (3) method selection (4) model fitting, (5) model evaluation and (6) habitat or area map construction. 1. Conceptualization. At this stage, we formulate the main goal of the study and decide on the modeling process design based on our knowledge of the species and the study. Data selection about the species and the environment is an important point at the initial stage. We decide whether to use only our data, or use other available data. Doing so will require some adjustments to the sample design. Next, we need to test the basic assumptions underlying the SDM, such as whether the species is in equilibrium with available environmental variables, whether the data is biased in any way (sampling bias, spatial autocorrelation, etc.), whether there are any environmental changes relative to the time of data collection, etc. Selection of adequate environmental and spatial variables, modeling algorithm, and model complexity should be based on study goals and the hypothesis regarding the relationship between the species under study and the environment in the area selected for study. 2. Data preparation. At this stage, we collect and process factual data about the species (both points of presence and points of absence) and the environment. When preparing data, particular attention should be paid to any inconsistencies in spatial and temporal scaling of dependent and independent variables, i.e. cases where there is a large spatial or temporal difference between species and environmental data, or between environmental data (spatial and climate variables). Also, special attention should be paid to the quality of georeferencing of points of presence and the quality of species identification, which, as a rule, suffers greatly if data is collected by amateurs. In these cases, we need to make decisions about adjusting the data or discarding it. All SDM algorithms require species absence information. If such information is not available, it is replaced by background points or “pseudo-absence” data, which naturally has a negative impact on the quality of the simulation, especially on a large scale. Consideration should be given in advance to how species data will be separated for model training and model testing if the simulation uses all data collected and there are no plans for further testing of the model in the field. 3. Method selection. At this stage, we select one or several modeling methods to combine into ensemble models. While simple factor or cluster analyses integrated into desktop GIS were used in early stages of modeling, today the selection of algorithms has expanded significantly: Linear regression methods: – Generalized linear model (GLM) (Nelder, Wedderburn, 1972), – Generalized additive model (GAM) (Hastie, Tibshirani, 1990); Machine learning methods: – Maximum entropy method implemented in the MaxEnt program (Soberson, Peterson, 2005; Phillips et al., 2006; Phillips, Dudik, 2008), – Random Forest (RF) is an ensemble learning method for classification and regression that works by constructing multiple decision trees during training (Breiman, 2001), – Boosted Regression Trees (BRT), – Convolutional Neural Networks (CNN) (LeCun et al., 1989), – Genetic algorithm for Rule Set Production (GARP) (Stockwell, 1999; Stockwell, Peters, 1999), – Machine learning supporting vector networks (Support Vector Machines, SVM) (Cortes, Vapnik, 1995; Vapnik et al., 1997), – XGBoost (eXtreme Gradient Boosting, XGB) (Chen, Guestrin, 2016). MaxEnt and Random Forest are integrated into ArcGIS, supported in R, and available online for Google Earth Engine (GEE) users. In recent years, GEE has become increasingly popular as a resource for SDM (Crego et al., 2022). 4. Fitting the model. This stage is key in SDM. Having received preliminary modeling data, we evaluate the contribution of multicollinearity and decide how to deal with it, determine how many variables can be included in the model without retraining, evaluate spatial or temporal autocorrelation and decide how to deal with it, determine the settings of the model or several models and choose which one provides the result, best or average. At the same stage, we check the plausibility of the selected relationships between species’ points of presence and environmental variables by comparing coefficients and visually inspecting the plotted curves on the graphs. 5. Model evaluation. At this stage, we evaluate the forecast performance of the final model using a set of validation or test data: AUC (ROC) (Fielding, Bell, 1997; Fawcett, 2006; Hosmer, Lemeshow, 2013), TSS (Liu et al., 2005; Allouche et al., 2006); R2 and Kappa (Brownlee, 2016; Zhang et al., 2021). Cross-validation (spatial blocks) is commonly used for this purpose (Roberts et al., 2017; Valavi et al., 2019; Crego et al., 2022). We also select thresholds to binarize predicted probabilities based on cross-validated predictions. Cross-validation (spatial blocks) is commonly used for this purpose (Roberts et al., 2017; Valavi et al., 2019; Crego et al., 2022). We also select thresholds to binarize predicted probabilities based on cross-validated predictions. 6. Constructing a map of habitats or range. This is the final stage of SDM, during which we convert our predictive model into a raster and obtain a classified image with the percentage probability of the species occurring in the study area for each pixel. We calculate a probability threshold for the species’ presence on pixels that we include in the final range map, and the size the area of habitat. The expediency of using a buffer depends on the scale of the resulting raster; the smaller the scale, the lower the relevance of the buffer. Buffer size is usually determined by the mean nearest neighbor distance (MND) and, depending on the modeling’s goals and objectives, is half, exactly, or twice the MND. One must always critically evaluate the underlying assumptions in SDM and be aware of the potential limitations associated with a variety of factors: the ability to detect the species, uneven sampling, limitations in the selection of environmental variables, ignorance regarding certain aspects of the species’ biology to identify patterns in its biotopic and territorial preferences, etc. SDM assumes that the species is in equilibrium with its environment, that we know and have carefully selected both the species' point of presence and environmental data, and that we have included all the major factors that determine the species' range limits. It should be understood that these aspects are not stable for several reasons. First, species, especially predators, respond dynamically to changes in the environment, so they will exhibit certain spatial and temporal dynamics and need to be properly taken into account in the modeling. Important factors that determine a species' response to changes in its habitat are its physiology, demography, ability to disperse, degree of tolerance to urbanization, degree of adaptation to changes in environmental factors, and interspecific interactions. All these factors engage seemingly constantly over time, including here and now, and ignoring them can significantly distort modeling results. Therefore, the ideal option for SDM is to check results in the field and adjust them. Unfortunately, most ornithologists have difficulty using R and desktop GIS, a fact that prevents them from processing the results of their field research in accordance with modern standards. For better implementation of modeling in practice when working with rare species, we have created a software product that allows bird specialists with minimal knowledge of GIS and programming languages, but who have a certain understanding of SDM algorithms and abundance assessment, to solve problems related to modeling distribution and abundance and survival of rare species. This software product is designed for processing various geodata containing observations of species; obtaining data from GEE rasters; classification of biotopes; population estimates, survival rates, etc. The main interface of the product is a web interface that allows the user to select the process of interest, enter the necessary data, and receive a link to an archive containing processing results. For geodata (points, polygons, etc.), it is possible to enter csv, shp, geojson files, as well as manual input using a map. To run algorithms in which it is necessary to add data from GEE rasters, a selection field is provided from the list of available earth remote sensing (ERS) products: NASADEM (NASA JPL, 2020), MOD13A1.061 Terra Vegetation Indices 16-Day Global 500m (Didan, 2021), Geomorpho90m (Amatulli et al., 2020), Global Habitat Heterogeneity (Tuanmu, Jetz, 2015), Global Wind Atlas (Badger et al., 2021), World Clim (Fick, Hijmans, 2017), ERA5-Land Monthly Aggregated – ECMWF Climate Reanalysis (Muñoz Sabater, 2019), ESA WorldCover 10m v100 (Zanaga et al., 2021), Dynamic World V1 (Brown et al., 2022), unclassified satellite data such as surface reflectivity (SR) collection 2 Landsat 8 atmospheric-corrected (blue, red, green, near-infrared and shortwave infrared 1 bands with 30 m spatial resolution) and ALOS-2 PALSAR L-band dual-polarization (HH and HV) SAR data, and NDVI and EVI calculation data from Landsat 8 images using the GEE (normalizedDifference) function. To run algorithms using various thirdparty libraries, data is entered in csv files in the formats required by the corresponding libraries. At the current stage, the product includes the following modules: 1) Obtaining data from GEE rasters for given points (result presented in a table with data selected for points from rasters included in the GEE collection); 2) Obtaining a classified raster for a given area and a set of points of presence and absence of a view (training points) using the RF and MaxEnt classifiers based on GEE (both classifiers allow, for a given area of interest, a set of training points and selected remote sensing products from GEE, to obtain a classified one with using appropriate GEE raster methods of the area of interest. It is possible to cross-validate the selected models and evaluate their predictive effectiveness); 3) Three different methods to stimulate population size: 3.1) Generation of random points in a regular network – a heuristic algorithm that, based on data on the points of presence of the species and on the studied areas, generates random points, simulating species’ distribution in the general area of interest; 3.2) Distance – a method based on the Distance Sampling model (Thomas et al., 2010; Buckland et al., 2015; Miller et al., 2019), that accepts input of a file with the necessary variables for points and areas and displays detailed statistics as a result; 3.3) Simple site surveys using calculation of a weighted average indicator for species distribution density (Karyakin, 2004) with an calculation of asymmetric confidence interval (Ravkin, Chelintsev, 1990); 4) Estimation of nest survival based on the RMARK library (Laake, 2013). The survival calculation module includes processing of nest survival data using the nest method of the RMARK library, which can account for various variables in remote sensing data and infers the importance of variables for nest survival. The software product is hosted on the servers of organizations recognized as undesirable in Russia, access to which is blocked by Roskomnadzor. The authors are considering options, including creating a clone on a Russian internet resource. This work is carried out with financial support from the Critical Ecosystem Partnership Fund (CEPF)38 within the framework of the project “Endangered Raptors Conservation on the Indo-Palaearctic Flyway”).
What problem does this paper attempt to address?
-
Modeling the geographic spread and proliferation of invasive alien plants (IAPs) into new ecosystems using multi-source data and multiple predictive models in the Heuningnes catchment, South Africa
Bhongolethu Mtengwana,Timothy Dube,Bester Tawona Mudereri,Cletah Shoko
DOI: https://doi.org/10.1080/15481603.2021.1903281
2021-04-06
Abstract:<span>The geographic spread and proliferation of Invasive Alien Plants (IAPs) into new ecosystems requires accurate, constant, and frequent monitoring particularly under the changing climate to ensure the integrity and resilience of affected as well as vulnerable ecosystems. This study thus aimed to understand the distribution and shifts of IAPs and the factors influencing such distribution at the catchment scale to minimize their risks and impacts through effective management. Three machine learning Species Distribution Modeling (SDM) techniques, namely, Random Forest (RF), Maximum Entropy (MaxEnt), Boosted Regression Trees (BRT) and their respective ensemble model were used to predict the potential distribution of IAPs within the catchment. The current and future bioclimatic variables, environmental and Sentinel-2 Multispectral Instrument satellite data were used to fit the models to predict areas at risk of IAPs invasions in the Heuningnes catchment, South Africa. The present and two future climatic scenarios from the Community Climate System Model (CCSM4) were considered in modeling the potential distribution of these species. The two future scenarios represented the minimum and maximum atmospheric carbon Representative Concentration Pathways (RCP) 2.6 and 8.5 for 2050 (average for 2041–2060). The results show that IAPs are predicted to expand under the influence of climate change in the catchment. Concurrently, riparian zones, bare areas, and the native vegetation which is rich in biodiversity will greatly be affected. The mean diurnal range (Bio2), warmest quarter maximum temperature (Bio5), and the warmest quarter precipitation (Bio18) were the most important bioclimatic variables in modeling the spatial distribution of IAPs in the catchment. Comparatively, all the models were successful in predicting the potential distribution of IAPs for all the scenarios. The BRT, MaxEnt, and RF predicted the spatial distribution of IAPs with an Area Under Curve (AUC) of 0.89, 0.92, and 0.94, respectively. The study highlighted the importance of multi-source data and multiple predictive models in predicting the current and potential future IAP distribution. The results from this study provide baseline information for effective land management, planning, and continuous monitoring of the further spread of IAPs within the Heuningnes catchment.</span>
geography, physical,remote sensing
-
Predictive multi‐scale occupancy models at range‐wide extents: Effects of habitat and human disturbance on distributions of wetland birds
Bryan S. Stevens,Courtney J. Conway
DOI: https://doi.org/10.1111/ddi.12995
2019-10-21
Diversity and Distributions
Abstract:<h3 class="article-section__sub-title section1"> Aim</h3><p>Predicting distributions is fundamental to ecology, yet hindered by spatially restricted sampling, scale‐dependent relationships and detection error associated with field surveys. Predictive species distribution models (SDMs) are nonetheless vital for conservation of many species. We developed a framework for building predictive SDMs with multi‐scale data and used it to develop range‐wide breeding‐season SDMs for 14 marsh bird species of concern.</p><h3 class="article-section__sub-title section1"> Location</h3><p>USA.</p><h3 class="article-section__sub-title section1"> Methods</h3><p>We built SDMs using data from range‐wide surveys conducted over 14 years, and habitat and disturbance covariates measured at multiple spatial scales. We built hierarchical occupancy models that included heterogeneity in detectability during sampling, and used Bayesian model selection to regulate model complexity (covariates and scales) based explicitly on spatial predictive abilities. We thus integrated model selection for optimizing out‐of‐sample prediction, range‐wide sampling over broad conditions, multi‐scale analyses and scale optimization, and species‐specific detectability for a suite of wide‐ranging species.</p><h3 class="article-section__sub-title section1"> Results</h3><p>Distributions of marsh birds were affected by local wetland conditions, but also by agricultural, urban and hydrologic disturbances operating from local scales (100–500 m) to the watershed level. Variables measuring human disturbances improved prediction for most species, and every species was affected by attributes at >1 scale. Five species showed evidence for continental‐scale range contraction during the study.</p><h3 class="article-section__sub-title section1"> Main conclusions</h3><p>We demonstrate how hierarchical occupancy models can be optimized for prediction across a species' range at the extent of a continent while also accounting for imperfect detection, and thus describe a generalizable approach that can be used for any species. We provide the first data‐driven, empirical SDMs built at the range‐wide extent for most of our 14 study species and demonstrate that previous studies focused on local distributions and the effects of fine‐scale wetland vegetation missed important broadscale drivers of occupancy for marsh birds.</p>
ecology,biodiversity conservation
-
The role of remote sensing in species distribution models: a review
Le Wang,Chunyuan Diao,Ying Lu
DOI: https://doi.org/10.1080/01431161.2024.2421949
IF: 3.531
2024-11-06
International Journal of Remote Sensing
Abstract:Species distribution models (SDMs) are invaluable for delineating ecological niches and assessing habitat suitability, facilitating the projection of species distributions across spatial and temporal dimensions. This capability is crucial for conservation planning, habitat management and understanding the impacts of climate change. Remote sensing has emerged as a superior alternative to traditional field surveys in developing SDMs, offering cost-effective, repetitive data collection over comprehensive spatial and temporal scales. Despite the rapid advancements in remote sensing technologies and analytical methods, the specific contributions of remote sensing to SDMs historically, and the potential pathways for its integration with SDMs remain ambiguous. Therefore, our study has set forth two objectives: firstly, to conduct a thorough review of remote sensing's role in SDMs, focusing on environmental predictors, response variables, scalability and validation; secondly, to outline prospective research trajectories for remote sensing within SDMs. Our findings reveal that remote sensing offers a plethora of environmental predictors for SDMs, encompassing climate, topography, land cover and use, spectral metrics and biogeochemical cycles. A variety of remote sensing techniques, including random forest, deep learning and linear unmixing, facilitate the derivation of SDM response variables and the development of species distribution models across diverse scales. Furthermore, remote sensing enables the validation of SDMs through its mapping outputs.
imaging science & photographic technology,remote sensing
-
The interplay of various sources of noise on reliability of species distribution models hinges on ecological specialisation
Alaaeldin Soultan,Kamran Safi
DOI: https://doi.org/10.1371/journal.pone.0187906
IF: 3.7
2017-11-13
PLoS ONE
Abstract:Digitized species occurrence data provide an unprecedented source of information for ecologists and conservationists. Species distribution model (SDM) has become a popular method to utilise these data for understanding the spatial and temporal distribution of species, and for modelling biodiversity patterns. Our objective is to study the impact of noise in species occurrence data (namely sample size and positional accuracy) on the performance and reliability of SDM, considering the multiplicative impact of SDM algorithms, species specialisation, and grid resolution. We created a set of four 'virtual' species characterized by different specialisation levels. For each of these species, we built the suitable habitat models using five algorithms at two grid resolutions, with varying sample sizes and different levels of positional accuracy. We assessed the performance and reliability of the SDM according to classic model evaluation metrics (Area Under the Curve and True Skill Statistic) and model agreement metrics (Overall Concordance Correlation Coefficient and geographic niche overlap) respectively. Our study revealed that species specialisation had by far the most dominant impact on the SDM. In contrast to previous studies, we found that for widespread species, low sample size and low positional accuracy were acceptable, and useful distribution ranges could be predicted with as few as 10 species occurrences. Range predictions for narrow-ranged species, however, were sensitive to sample size and positional accuracy, such that useful distribution ranges required at least 20 species occurrences. Against expectations, the MAXENT algorithm poorly predicted the distribution of specialist species at low sample size.
multidisciplinary sciences
-
The shadow model: how and why small choices in spatially explicit species distribution models affect predictions
Christian J C Commander,Lewis A K Barnett,Eric J Ward,Sean C Anderson,Timothy E Essington
DOI: https://doi.org/10.7717/peerj.12783
IF: 3.061
2022-02-14
PeerJ
Abstract:The use of species distribution models (SDMs) has rapidly increased over the last decade, driven largely by increasing observational evidence of distributional shifts of terrestrial and aquatic populations. These models permit, for example, the quantification of range shifts, the estimation of species co-occurrence, and the association of habitat to species distribution and abundance. The increasing complexity of contemporary SDMs presents new challenges-as the choices among modeling options increase, it is essential to understand how these choices affect model outcomes. Using a combination of original analysis and literature review, we synthesize the effects of three common model choices in semi-parametric predictive process species distribution modeling: model structure, spatial extent of the data, and spatial scale of predictions. To illustrate the effects of these choices, we develop a case study centered around sablefish (Anoplopoma fimbria) distribution on the west coast of the USA. The three modeling choices represent decisions necessary in virtually all ecological applications of these methods, and are important because the consequences of these choices impact derived quantities of interest (e.g., estimates of population size and their management implications). Truncating the spatial extent of data near the observed range edge, or using a model that is misspecified in terms of covariates and spatial and spatiotemporal fields, led to bias in population biomass trends and mean distribution compared to estimates from models using the full dataset and appropriate model structure. In some cases, these suboptimal modeling decisions may be unavoidable, but understanding the tradeoffs of these choices and impacts on predictions is critical. We illustrate how seemingly small model choices, often made out of necessity or simplicity, can affect scientific advice informing management decisions-potentially leading to erroneous conclusions about changes in abundance or distribution and the precision of such estimates. For example, we show how incorrect decisions could cause overestimation of abundance, which could result in management advice resulting in overfishing. Based on these findings and literature gaps, we outline important frontiers in SDM development.
-
Presence-only species distribution models are sensitive to sample prevalence: Evaluating models using spatial prediction stability and accuracy metrics
Liam Grimmett,Rachel Whitsed,Ana Horta
DOI: https://doi.org/10.1016/j.ecolmodel.2020.109194
IF: 3.1
2020-09-01
Ecological Modelling
Abstract:Species distribution modelling (SDM) is an important tool for ecologists, but different algorithms and different sampling strategies produce different results. Using virtual species with differing characteristics, this study investigated the effect of sampling strategy choices on SDM predictions across multiple algorithms and species, including the impacts of different sample size and prevalence choices, and the effects of validating models using presence and background data as opposed to true absences. We also assessed the consistency of predictions between algorithms, and investigated the effectiveness of using stability assessment of spatial predictions in geographic space to evaluate SDM predictions. Maxent performed most consistently under all scenarios both in regards to performance metrics and spatial prediction stability, and should be considered for most scenarios either on its own or as part of a model ensemble, in particular when true absences are not available. A key recommendation of this study is the use of metrics to assess agreement between replicate predictions as a measure of spatial stability, rather than relying solely on performance metrics such as area under the curve (AUC).
ecology
-
Assessing the reliability of species distribution projections in climate change research
Luca Santini,Ana Benítez‐López,Luigi Maiorano,Mirza Čengić,Mark A. J. Huijbregts
DOI: https://doi.org/10.1111/ddi.13252
2021-02-19
Diversity and Distributions
Abstract:<section class="article-section__content"><h3 class="article-section__sub-title section1"> Aim</h3><p>Forecasting changes in species distribution under future scenarios is one of the most prolific areas of application for species distribution models (SDMs). However, no consensus yet exists on the reliability of such models for drawing conclusions on species' distribution response to changing climate. In this study, we provide an overview of common modelling practices in the field and assess the reliability of model predictions using a virtual species approach.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Location</h3><p>Global.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Methods</h3><p>We first review papers published between 2015 and 2019. Then, we use a virtual species approach and three commonly applied SDM algorithms (GLM, MaxEnt and random forest) to assess the estimated and actual predictive performance of models parameterized with different modelling settings and violations of modelling assumptions.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Results</h3><p>Most SDM papers relied on single models (65%) and small samples (<i>N</i> < 50, 62%), used presence‐only data (85%), binarized models' output (74%) and used a split‐sample validation (94%). Our simulation reveals that the split‐sample validation tends to be over‐optimistic compared to the real performance, whereas spatial block validation provides a more honest estimate, except when datasets are environmentally biased. The binarization of predicted probabilities of presence reduces models' predictive ability considerably. Sample size is one of the main predictors of the real model accuracy, but has little influence on estimated accuracy. Finally, the inclusion of ecologically irrelevant predictors and the violation of modelling assumptions increases estimated accuracy but decreases real accuracy of model projections, leading to biased estimates of range contraction and expansion. </p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Main conclusions</h3><p>Our ability to predict future species distribution is low on average, particularly when models' predictions are binarized. A robust validation by spatially independent samples is required, but does not rule out inflation of model accuracy by assumption violation. Our findings call for caution in the application and interpretation of SDM projections under different climates.</p></section>
biodiversity conservation,ecology
-
Species distribution modelling for plant communities: stacked single species or multivariate modelling approaches?
Emilie B. Henderson,Janet L. Ohmann,Matthew J. Gregory,Heather M. Roberts,Harold Zald
DOI: https://doi.org/10.1111/avsc.12085
2014-01-15
Applied Vegetation Science
Abstract:Abstract Aim Landscape management and conservation planning require maps of vegetation composition and structure over large regions. Species distribution models ( SDM s) are often used for individual species, but projects mapping multiple species are rarer. We compare maps of plant community composition assembled by stacking results from many SDM s with multivariate maps constructed using nearest‐neighbor imputation. Location Western Cascades ecoregion, Oregon and California, USA . Methods We mapped distributions and abundances of 28 tree species over 4,007,110 ha at 30‐m resolution using three approaches: SDM s using machine learning (random forest) to yield: (1) binary ( RF _Bin); (2) basal area (abundance; RF _Abund) predictions; and (3) multi‐species basal area predictions using a nearest‐neighbor imputation variant based on random forest ( RF _ NN ). We evaluated accuracy of binary predictions for all models, compared area mapped with plot‐based areal estimates, assessed species abundance at two spatial scales and evaluated communities for species richness, problematic compositional errors and overall community composition. Results RF _Bin yielded the strongest binary predictions (median True Skill Statistics; RF _Bin: 0.57, RF _ NN : 0.38, RF _Abund: 0.27). Plot‐scale predictions of abundance were poor for RF _Abund and RF _ NN (median Agreement Coefficient ( AC ): −1.77 and −2.28), but strong when summarized over 50‐km radius tessellated hexagons (median AC for both: 0.79). RF _Abund's strength with abundance and weakness with binary predictions stems from predicting small values instead of zeros. The number of zero value predictions from RF _ NN was closest to counts of zeros in the plot data. Correspondingly, RF _ NN 's map‐based species area estimates closely matched plot‐based area estimates. RF _ NN also performed best for community‐level accuracy metrics. Conclusions RF _ NN was the best technique for building a broad‐scale map of diversity and composition because the modelling framework maintained inter‐species relationships from the input plot data. Re‐assembling communities from single variable maps often yielded unrealistic communities. Although RF _ NN rarely excelled at single species predictions of presence or abundance, it was often adequate to many (but not all) applications in both dimensions. We discuss our results in the context of map utility for applications in the fields of ecology, conservation and natural resource management planning. We highlight how RF _ NN is well‐suited for mapping current but not future vegetation.
ecology,plant sciences,forestry
-
Dealing with overprediction in species distribution models: How adding distance constraints can improve model accuracy
Poliana Mendes,Santiago José Elías Velazco,André Felipe Alves de Andrade,Paulo De Marco
DOI: https://doi.org/10.1016/j.ecolmodel.2020.109180
IF: 3.1
2020-09-01
Ecological Modelling
Abstract:Species distribution models can be affected by overprediction when dispersal movement is not incorporated into the modelling process. We compared the efficiency of seven methods that take into account spatial constraints to reduce overprediction when using four algorithms for species distribution models. By using a virtual ecologist approach, we were able to measure the accuracy of each model in predicting actual species distributions. We built 40 virtual species distributions within the Neotropical realm. Then, we randomly sampled 50 occurrences that were used in seven spatially restricted species distribution models (hereafter called M-SDMs) and a non-spatially restricted ecological niche model (ENM). We used four algorithms; Maximum Entropy, Generalized Linear Models, Random Forest, and Support Vector Machine. M-SDM methods were divided into a priori methods, in which spatial restrictions were inserted with environmental variables in the modelling process, and a posteriori methods, in which reachable and suitable areas were overlapped. M-SDM efficiency was obtained by calculating the difference in commission and omission errors between M-SDMs and ENMs. We used linear mixed-effects models to test if differences in commission and omission errors varied among the M-SDMs and algorithms. Our results indicate that overall M-SDMs reduce overprediction with no increase in underprediction compared to ENMs with few exceptions, such as a priori methods combined with the Support Vector Machine algorithm. There is a high variation in modelling performance among species, but there were only a few cases in which overprediction or underprediction increased. We only compared methods that do not require species dispersal data, guaranteeing that they can be applied to less-studied species. We advocate that species distribution modellers should not ignore spatial constraints, especially because they can be included in models at low costs but high benefits in terms of overprediction reduction.
ecology
-
The potential for species distribution models to distinguish source populations from sinks
Bilgecan Şen,Christian Che-Castaldo,H Reşit Akçakaya
DOI: https://doi.org/10.1111/1365-2656.14201
Abstract:While species distribution models (SDM) are frequently used to predict species occurrences to help inform conservation management, there is limited evidence evaluating whether habitat suitability can reliably predict intrinsic growth rates or distinguish source populations from sinks. Filling this knowledge gap is critical for conservation science, as applications of SDMs for management purposes ultimately depend on these typically unobserved population or metapopulation dynamics. Using linear regression, we associated previously published population level estimates of intrinsic growth and abundance derived from a Bayesian analysis of mark-recapture data for 17 bird species found in the contiguous United States with SDM habitat suitability estimates fitted here to opportunistic data for these same species. We then used the area under the ROC curve (AUC) to measure how well SDMs can distinguish populations categorized as sources and sinks. We built SDMs using two different approaches, boosted regression trees (BRT) and generalized linear models (GLM), and compared their source/sink predictive performance. Each SDM was built with presence points obtained from eBird (a web-available database) and 10 environmental variables previously selected to model intrinsic growth rates and abundance for these species. We show that SDMs built with opportunistic data are poor predictors of species demography in general; both BRT and GLM explained very little spatial variation of intrinsic growth rate and population abundance (median R2 across 17 species was close to 0.1 for both SDM methods). SDMs, however, estimated higher suitability for source populations as compared to sinks. Out of 13 species which had both source and sink populations, both BRT and GLM had AUC values greater than 0.7 for 7 species when discriminating between sources and sinks. Habitat suitability have the potential to be a useful measure to indicate a population's ability to sustain itself as a source population; however more research on a diverse set of taxa is essential to fully explore this potential. This interpretation of habitat suitability can be particularly useful for conservation practice, and identification of explicit cases of when and how SDMs fail to match population demography can be informative for advancing ecological theory.
-
Study extent influences the predictions and performance of species distribution models: a case study of six amphibian species at the edge of their geographic distributions in western Canada
Bergman, Jayna C.
DOI: https://doi.org/10.1007/s10531-024-02953-3
IF: 3.4
2024-10-19
Biodiversity and Conservation
Abstract:Species distribution models (SDMs) are often generated to inform conservation plans. When developed for use in spatially-restricted areas, such as protected areas, investigators often make arbitrary decisions as to the geographic extent from which locality data to inform the model are drawn (i.e. the "study extent" of the model). However, there has been little attention to the impacts of this decision on model predictions. Here we explore the impacts of study extent on SDM predictions of (i) suitable habitat for or (ii) the actual occurrence of individual species, as well as on (iii) the identification of sites that could support multiple species (i.e. from stacked-SDMs). Focusing on six amphibian species of conservation concern at the edge of their range in western Canada, we generated SDMs using range-wide, ecoregion, and political study extents and compared the alternative predictions for each species in one of two national parks of interest. Differences in model predictions were substantial, with precent agreement among models developed with different extents as low as 10% for one of the species. Study extent also influenced the ability of models to predict independent occurrence at the edge of the range, although most models performed poorly in this regard (AUC < 0.7). Finally, study extent influenced stacked predictions, suggesting that uncertainty in individual species predictions muddies interpretation of SDMs at the community-level. Importantly, results varied across species and region, precluding simple recommendations for choosing a study extent; Instead, uncertainty arising from this decision should be quantified before using SDMs in conservation planning.
environmental sciences,biodiversity conservation,ecology
-
Dealing with area‐to‐point spatial misalignment in species distribution models
Bastien Mourguiart,Mathieu Chevalier,Martin Marzloff,Nathalie Caill‐Milly,Kerrie Mengersen,Benoit Liquet
DOI: https://doi.org/10.1111/ecog.07104
IF: 5.9
2024-03-24
Ecography
Abstract:Species distribution models (SDMs) are extensively used to estimate species–environment relationships (SERs) and predict species distribution across space and time. For this purpose, it is key to choose relevant spatial grains for predictor and response variables at the onset of the modelling process. However, environmental variables are often derived from large‐scale climate models at a grain that can be coarser than the one of the response variable. Such area‐to‐point spatial misalignment can bias estimates of SER and jeopardise the robustness of predictions. We used a virtual species approach, running simulations across different levels of area‐to‐point spatial misalignment to seek statistical solutions to this problem. We specifically compared accuracy of SER estimates and predictive performances, assessed across different degrees of spatial heterogeneity in environmental conditions, of three SDMs: a GLM, a spatial GLM and a Berkson error model (BEM) that accounts for fine‐grain environmental heterogeneity within coarse‐grain cells. Only the BEM accurately estimates SER from relatively coarse‐grain environmental data (up to 50 times coarser than the response grain), while the two GLMs provide flattened SER. However, all three models perform poorly when predicting from coarse‐grain data, particularly in environments that are more heterogeneous than the training conditions. Conversely, decreasing environmental heterogeneity relative to the training dataset reduces the predictive biases. Because predictions are made from covariate‐grain data, the BEM displays lower predictive performance than the two GLMs. Thus, standard model selection methods would fail to select the model that best estimates SERs (here, the BEM), which could lead to false interpretations about the environmental drivers of species distributions. Overall, we conclude that the BEM, because it can robustly estimate SER at the response grain, holds great promise to overcome area‐to‐point misalignment.
biodiversity conservation,ecology
-
Species distribution modelling supports the study of past, present and future biogeographies
Janet Franklin
DOI: https://doi.org/10.1111/jbi.14617
2023-04-12
Journal of Biogeography
Abstract:Species distribution modelling (SDM), also called environmental or ecological niche modelling, has developed over the last 30 years as a widely used tool used in core areas of biogeography including historical biogeography, studies of diversity patterns, studies of species ranges, ecoregional classification, conservation assessment and projecting future global change impacts. In the 50th anniversary year of Journal of Biogeography, I reflect on developments in species distribution modelling, illustrate how embedded the methodology has become in all areas of biogeography and speculate on future directions in the field. Challenges to species distribution modelling raised in this journal in 2006 have been addressed to a significant degree. Those challenges are clarification of the niche concept; improved sample design for species occurrence data; model parameterization; predictor selection; assessing model performance and transferability; and integrating correlative and process models of species distributions. SDM is used, often in conjunction with other evidence, to understand past species range dynamics, identify patterns and drivers of biological diversity, identify drivers of species range limits, define and delineate ecoregions, estimate the distributions of biodiversity elements in relation to protected status and to prioritize conservation action, and to forecast species range shifts in response to climate change and other global change scenarios. Areas of progress in SDM that may become more widely accessible and useful tools in biogeography include genetically informed models and community distribution models.
ecology,geography, physical
-
Measuring and comparing the accuracy of species distribution models with presence–absence data
M. White,Canran Liu,G. Newell
DOI: https://doi.org/10.1111/J.1600-0587.2010.06354.X
2011-04-01
Abstract:Species distribution models have been widely used to predict species distributions for various purposes, including conservation planning, and climate change impact assessment. The success of these applications relies heavily on the accuracy of the models. Various measures have been proposed to assess the accuracy of the models. Rigorous statistical analysis should be incorporated in model accuracy assessment. However, since relevant information about the statistical properties of accuracy measures is scattered across various disciplines, ecologists find it difficult to select the most appropriate ones for their research. In this paper, we review accuracy measures that are currently used in species distribution modelling (SDM), and introduce additional metrics that have potential applications in SDM. For the commonly used measures (which are also intensively studied by statisticians), including overall accuracy, sensitivity, specificity, kappa, and area and partial area under the ROC curves, promising methods to construct confidence intervals and statistically compare the accuracy between two models are given. For other accuracy measures, methods to estimate standard errors are given, which can be used to construct approximate confidence intervals. We also suggest that as general tools, computer-intensive methods, especially bootstrap and randomization methods can be used in constructing confidence intervals and statistical tests if suitable analytic methods cannot be found. Usually, these computer-intensive methods provide robust results.
Biology,Environmental Science
-
Data‐centric species distribution modeling: Impacts of modeler decisions in a case study of invasive European frog‐bit
Sara E. Hansen,Michael J. Monfils,Rachel A. Hackett,Ryan T. Goebel,Anna K. Monfils
DOI: https://doi.org/10.1002/aps3.11573
2024-03-13
Applications in Plant Sciences
Abstract:Premise Species distribution models (SDMs) are widely utilized to guide conservation decisions. The complexity of available data and SDM methodologies necessitates considerations of how data are chosen and processed for modeling to enhance model accuracy and support biological interpretations and ecological applications. Methods We built SDMs for the invasive aquatic plant European frog‐bit using aggregated and field data that span multiple scales, data sources, and data types. We tested how model results were affected by five modeler decision points: the exclusion of (1) missing and (2) correlated data and the (3) scale (large‐scale aggregated data or systematic field data), (4) source (specimens or observations), and (5) type (presence‐background or presence‐absence) of occurrence data. Results Decisions about the exclusion of missing and correlated data, as well as the scale and type of occurrence data, significantly affected metrics of model performance. The source and type of occurrence data led to differences in the importance of specific explanatory variables as drivers of species distribution and predicted probability of suitable habitat. Discussion Our findings relative to European frog‐bit illustrate how specific data selection and processing decisions can influence the outcomes and interpretation of SDMs. Data‐centric protocols that incorporate data exploration into model building can help ensure models are reproducible and can be accurately interpreted in light of biological questions.
plant sciences
-
Species-Distribution Modeling: Advantages and Limitations of Its Application. 1. General Approaches
A. A. Lissovsky,S. V. Dudov,E. V. Obolenskaya
DOI: https://doi.org/10.1134/S2079086421030075
2021-06-04
Abstract:For a long time, studies of the distribution of living beings in a geographical space were performed only with empirical methods. A change in the view of a species distribution as a projection of a Hutchinsonian ecological niche led to the formation of the discipline of ecological modeling of the species distribution, which switched faunistics/floristics from data accumulation to a full-fledged scientific industry with experiment planning and result verification. The various methods of species-distribution modeling make it possible to analyze the patterns of the geographical distributional of organisms in the presence of methodological challenges: nonrandomness of the occurrence data, inhomogeneity of the collection efforts, landscape heterogeneity in different scales, etc. The results of species-distribution modeling represent spatially continuous data of habitat suitability and are valuable not only for studies of the habitats themselves but also for a number of disciplines that involve species distributions.
biology
-
How sensitive are species distribution models to different background point selection strategies? A test with species at various equilibrium levels
Bart Steen,Olivier Broennimann,Luigi Maiorano,Antoine Guisan
DOI: https://doi.org/10.1016/j.ecolmodel.2024.110754
IF: 3.1
2024-05-19
Ecological Modelling
Abstract:Species distribution models (SDMs) have become central tools in ecology and biogeography. Although they can be fitted with different types of species data (e.g. presence-absence, abundance), the most common approach, based on data from large species repositories, is to use simple occurrences (i.e. presence-only) combined with background points (BP; also called pseudo-absences). But how should we sample these background points, and how does this choice affect SDMs? In most studies so far, BP were sampled randomly in geographic space, yet theory rather suggests, if a species is at equilibrium, that it is better to sample them in a stratified way in environmental space. However, this potential improvement of SDM predictions has never been tested. Furthermore, a typical assumption behind SDMs is that the modelled species are at equilibrium with their environment. But how do these models perform when species are in disequilibrium, as is the case for most invasive species? To answer these questions, we selected 30 different species (10 insects, 10 mammals and 10 plants; for each group 5 were invasive and 5 were considered at equilibrium) and for each we calibrated SDMs with different types of background selections: random in environmental space, random-stratified in environmental space, random in geographic space, and random-stratified in geographic space. For each SDM we assessed both predictive performance using standard metrics and their stability using a new approach that compares the model's habitat suitability projection with those of a SDM calibrated with virtual occurrence data generated from the most suitable areas. Finally, we compared the predictive performance of species distribution models of invasive alien (disequilibrium) species versus native (equilibrium) species by comparing model stability and performance metrics of the two groups. We found that sampling BP in a stratified-random way in environmental space yields the highest performance metrics, and that sampling fully randomly in environmental space yields the most stable models. This has implications for the use of SDMs in conservation, as the classical and frequently used fully random in geographic space BP are found to produce both less accurate and less stable models. Our results indicate that the best approach is to use stratified random in environmental space BP sampling if accuracy is essential, and fully random in environmental space BP sampling if model stability is essential.
ecology
-
Using DEM to Predict Abies Faxoniana and Quercus Aquifolioides Distributions in the Upstream Catchment Basin of the Min River in Southwest China
Lei Zhang,Shirong Liu,Pengsen Sun,Tongli Wang,Guangyu Wang,Linlin Wang,Xudong Zhang
DOI: https://doi.org/10.1016/j.ecolind.2016.04.008
IF: 6.9
2016-01-01
Ecological Indicators
Abstract:The species distribution model (SDM), which is used to spatially predict species distributions, can also identify the probable causes of the location of certain species (i.e. the mathematical description of habitat requirements). Therefore, SDM has the potential to guide resource management and biodiversity conservation. In the topographically complex terrain, SDMs are often complicated by the lack of environmental data; however, the first information that is typically obtained for these analyses is a topographic map. Here, the possibility of using 16 predictor variables derived from the digital elevation model (DEM) to model the distributions of Abies faxoniana and Quercus aquifolioides in the mountainous upstream catchment basin of the Min River (UCBM) in southwest China was investigated. In particular, with the ensemble modeling approach based on eight niche models and nine model-training and -testing datasets, changes in model performance and shifts in the explanatory power of the predictor variable over five different levels of spatial resolution (30m, 90m, 120m, 240m, 900m) were assessed. Almost all models succeed in predicting the distributions of both species, although predictive accuracies differed significantly among spatial scales and model classes. On average, model accuracies increased to the highest level at the meso-scale (120m and 240m for A. faxoniana and Q. aquifolioides, respectively) and then decreased as resolution became coarser, indicating that high spatial resolution does not imply a better model. The relative importance rankings for each topographical variable were consistent across all spatial scales, but their explanatory powers did differ significantly among spatial scales. Elevation and terrain-distributed solar radiation for growing season (SRG) drive the distributions of A. faxoniana and Q. aquifolioides with a much higher level of confidence than other predictors across all spatial scales; the former tended to decrease, and the latter tended to increase when spatial resolution became coarse. Our findings confirm that DEM can be used exclusively and effectively to predict species distribution. Multi-scale analysis is needed to detect highly subtle variations in species habitat requirements, and to select the spatial scale that corresponds to known spatial characteristics of the species habitat. This has broad implications for distribution modeling of species in rugged terrain.
-
Matching Data Types to the Objectives of Species Distribution Modeling: An Evaluation With Marine Fish Species
Jing Luan,Chongliang Zhang,Yupeng Ji,Binduo Xu,Ying Xue,Yiping Ren
DOI: https://doi.org/10.3389/fmars.2021.771071
IF: 5.247
2021-10-22
Frontiers in Marine Science
Abstract:Species distribution model (SDM) is a crucial tool for forecasting ranges of species and mirroring habitat references and quality. Different types of species distribution data have been commonly used in SDMs regarding different purposes and availability, whereas, the influences of data types on model performances have not been well understood. This study considered three data types characterized by different levels of organism information and cost in data acquisitions, namely presence/absence (P/A), ordinal data, and abundance data. We developed a range of distribution models for nine demersal species in the coastal waters of Shandong Peninsula, China, using two modeling algorithms [the Generalized Additive Model (GAM) and Random Forest]. Firstly, we evaluated the performances of all models on predicting species occurrence (i.e., habitat suitability or range boundaries), and then compared the models built with ordinal data and abundance data on projecting ordinal predictions (i.e., relative density or habitat quality). Their predictive abilities were assessed through cross-validation tests with diverse performance measurements. Overall, no data type is superior in all situations, but combined with two algorithms, the abundance data slightly outperformed the ordinal data and P/A data unexpectedly exerted reliable performances. Specifically, the effectiveness of data type for two application purposes of SDMs substantially varied with modeling algorithms, revealing that GAMs always benefit most from ordinal data and the opposite was true for Random Forest. For some small resident organisms with moderate prevalence, rough distribution data might be adopted for providing reliable projections. Our findings highlight the importance of clarifying the objectives of SDMs when choosing data types for species distribution modeling.
marine & freshwater biology
-
MaxEnt brings comparable results when the input data are being completed; Model parameterization of four species distribution models
Mohsen Ahmadi,Mahmoud‐Reza Hemami,Mohammad Kaboli,Farzin Shabani
DOI: https://doi.org/10.1002/ece3.9827
IF: 3.167
2023-02-18
Ecology and Evolution
Abstract:The input data for species distribution modeling (SDM) are always completing, and data of the unknown and range‐restricted species are mostly spatially imbalanced‐biased. Model parameterization is necessary to improve the predictive performance of the species distribution models. The MaxEnt model provides reproducible results when the input data are being developed. Species distribution models (SDMs) are practical tools to assess the habitat suitability of species with numerous applications in environmental management and conservation planning. The manipulation of the input data to deal with their spatial bias is one of the advantageous methods to enhance the performance of SDMs. However, the development of a model parameterization approach covering different SDMs to achieve well‐performing models has rarely been implemented. We integrated input data manipulation and model tuning for four commonly‐used SDMs: generalized linear model (GLM), gradient boosted model (GBM), random forest (RF), and maximum entropy (MaxEnt), and compared their predictive performance to model geographically imbalanced‐biased data of a rare species complex of mountain vipers. Models were tuned up based on a range of model‐specific parameters considering two background selection methods: random and background weighting schemes. The performance of the fine‐tuned models was assessed based on recently identified localities of the species. The results indicated that although the fine‐tuned version of all models shows great performance in predicting training data (AUC > 0.9 and TSS > 0.5), they produce different results in classifying out‐of‐bag data. The GBM and RF with higher sensitivity of training data showed more different performances. The GLM, despite having high predictive performance for test data, showed lower specificity. It was only the MaxEnt model that showed high predictive performance and comparable results for identifying test data in both random and background weighting procedures. Our results highlight that while GBM and RF are prone to overfitting training data and GLM over‐predict nonsampled areas MaxEnt is capable of producing results that are both predictable (extrapolative) and complex (interpolative). We discuss the assumptions of each model and conclude that MaxEnt could be considered as a practical method to cope with imbalanced‐biased data in species distribution modeling approaches.
ecology,evolutionary biology