StatEcoNet: Statistical Ecology Neural Networks for Species Distribution Modeling

Eugene Seo,Rebecca A. Hutchinson,Xiao Fu,Chelsea Li,Tyler A. Hallman,John Kilbride,W. Douglas Robinson
DOI: https://doi.org/10.48550/arXiv.2102.08534
2021-02-18
Abstract:This paper focuses on a core task in computational sustainability and statistical ecology: species distribution modeling (SDM). In SDM, the occurrence pattern of a species on a landscape is predicted by environmental features based on observations at a set of locations. At first, SDM may appear to be a binary classification problem, and one might be inclined to employ classic tools (e.g., logistic regression, support vector machines, neural networks) to tackle it. However, wildlife surveys introduce structured noise (especially under-counting) in the species observations. If unaccounted for, these observation errors systematically bias SDMs. To address the unique challenges of SDM, this paper proposes a framework called StatEcoNet. Specifically, this work employs a graphical generative model in statistical ecology to serve as the skeleton of the proposed computational framework and carefully integrates neural networks under the framework. The advantages of StatEcoNet over related approaches are demonstrated on simulated datasets as well as bird species data. Since SDMs are critical tools for ecological science and natural resource management, StatEcoNet may offer boosted computational and analytical powers to a wide range of applications that have significant social impacts, e.g., the study and conservation of threatened species.
Machine Learning,Populations and Evolution
What problem does this paper attempt to address?
The core problems that this paper attempts to solve are several unique challenges in Species Distribution Modeling (SDM). Specifically, SDM aims to predict the occurrence pattern of a species on the landscape through environmental characteristics, but this process faces the following main problems: 1. **Incomplete detection**: Structured noise is introduced in field surveys, especially the underestimation of the number of species. Due to poor observation conditions, behavioral characteristics of species or limited survey efforts, some individuals may not be recorded in the data. If these observation errors are not corrected, they will systematically bias the results of SDM. 2. **Complex environmental responses**: The ways in which species respond to the environment are complex, so the model needs to handle multiple input variables and represent non - linear relationships. 3. **Interpretability of the model**: In order to transform the conclusions of the model into meaningful scientific insights and effective management policies, the model must be as interpretable as possible. 4. **Small - scale data sets**: Compared with some other fields of machine learning, SDM is usually constructed based on smaller data sets, with only a few hundred rather than tens of thousands of samples. To address the above challenges, the paper proposes a framework named StatEcoNet. This framework combines graph - generation models in statistical ecology and neural networks to capture the impact of incomplete detection and handle the non - linear relationships between the environment and species. Specifically, the main contributions of StatEcoNet include: - **Combining statistical models and neural networks**: By introducing latent variable models to capture the impact of incomplete detection, and at the same time using neural networks to capture the complex non - linear relationships between the environment and species. - **Feature selection**: A strategy based on group sparse regularization is introduced to select features related to occupancy and detection probability, thereby improving the interpretability and performance of the model. - **Efficient training methods**: The sub - gradient descent algorithm is used for maximum - likelihood estimation to optimize model parameters. Through these methods, StatEcoNet aims to provide a more powerful and accurate species - distribution - modeling tool, which is applicable to multiple fields such as ecological research and natural - resource management.