Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks

Maximilian Pichler,Virginie Boreux,Alexandra-Maria Klein,Matthias Schleuning,Florian Hartig

DOI: https://doi.org/10.1111/2041-210X.13329

2019-11-05

Abstract:Ecologists have long suspected that species are more likely to interact if their traits match in a particular way. For example, a pollination interaction may be more likely if the proportions of a bee's tongue fit a plant's flower shape. Empirical estimates of the importance of trait-matching for determining species interactions, however, vary significantly among different types of ecological networks. Here, we show that ambiguity among empirical trait-matching studies may have arisen at least in parts from using overly simple statistical models. Using simulated and real data, we contrast conventional generalized linear models (GLM) with more flexible Machine Learning (ML) models (Random Forest, Boosted Regression Trees, Deep Neural Networks, Convolutional Neural Networks, Support Vector Machines, naive Bayes, and k-Nearest-Neighbor), testing their ability to predict species interactions based on traits, and infer trait combinations causally responsible for species interactions. We find that the best ML models can successfully predict species interactions in plant-pollinator networks, outperforming GLMs by a substantial margin. Our results also demonstrate that ML models can better identify the causally responsible trait-matching combinations than GLMs. In two case studies, the best ML models successfully predicted species interactions in a global plant-pollinator database and inferred ecologically plausible trait-matching rules for a plant-hummingbird network, without any prior assumptions. We conclude that flexible ML models offer many advantages over traditional regression models for understanding interaction networks. We anticipate that these results extrapolate to other ecological network types. More generally, our results highlight the potential of machine learning and artificial intelligence for inference in ecology, beyond standard tasks such as image or pattern recognition.

Populations and Evolution,Machine Learning,Quantitative Methods

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to use machine - learning algorithms to predict species interactions more accurately and infer the feature - matching combinations that lead to these interactions**. Specifically, the author focuses on whether species interactions in ecological networks, especially in plant - pollinator networks, can be predicted by their feature matching. Traditional statistical models (such as the Generalized Linear Model, GLM) have limitations when dealing with such problems. Therefore, the author introduced multiple machine - learning models (such as Random Forest, Boosted Regression Trees, Deep Neural Networks, etc.) to improve the prediction accuracy and interpretability. ### Background and Problems of the Paper 1. **Importance of Species Interactions** - Ecologists have long believed that if certain features of species match properly, they are more likely to interact. For example, when the length of a bee's tongue matches the shape of a flower, the probability of pollination is greater. - However, in different types of ecological networks, the importance of feature matching for species interactions varies significantly. 2. **Limitations of Existing Research** - Most previous studies used simple statistical models (such as GLM), which may be too simplistic to capture complex feature interactions. - Machine - learning models (such as Random Forest, Deep Neural Networks, etc.) have higher flexibility and prediction performance, but their interpretability is poor. 3. **Research Purposes** - **Evaluate the Prediction Performance of Different Machine - Learning Models**: Compare the performance of different machine - learning models in predicting species interactions in plant - pollinator networks through simulated and real data. - **Infer Causal Feature Combinations**: Use methods such as the H - statistic to extract the causal feature combinations that lead to species interactions from the fitted machine - learning models. ### Methods and Experimental Design 1. **Data and Models** - Use simulated and real data for experiments. - Consider factors such as species abundance distribution and observation time during the generation of simulated data. - Select multiple machine - learning models (k - Nearest Neighbors, Random Forest, Boosted Regression Trees, Deep Neural Networks, Support Vector Machines, Naive Bayes, Convolutional Neural Networks) and the traditional Generalized Linear Model as a benchmark. 2. **Performance Evaluation** - Use indicators such as AUC (Area Under the Receiver Operating Characteristic Curve), TSS (True Skill Statistic), and Spearman's correlation coefficient to evaluate the prediction performance of the models. - For real data, also calculate classification - threshold - dependent performance indicators such as accuracy, sensitivity, precision, and specificity. 3. **Inference of Causal Feature Combinations** - Use the H - statistic to estimate the interaction strength between features, thereby inferring the causal feature combinations. ### Experimental Results 1. **Prediction Performance** - In the absence of feature matching, the prediction performance of all models is close to the random level (AUC ≈ 0.5, TSS ≈ 0). - In the case of uneven species abundance, the prediction performance of the models has improved, but it is still limited. - In the case of feature matching, the prediction performance of machine - learning models (especially Random Forest, Deep Neural Networks, etc.) is significantly better than that of the traditional Generalized Linear Model. 2. **Inference of Causal Feature Combinations** - Through the H - statistic, machine - learning models can successfully identify the causal feature combinations that lead to species interactions. - In two case studies, machine - learning models successfully predicted species interactions in the global plant - pollinator database and inferred reasonable feature - matching rules in the Costa Rican plant - hummingbird network. ### Conclusion The author concludes that flexible machine - learning models have significant advantages in understanding species interactions in ecological networks. They not only improve prediction performance but also can better identify causal feature combinations. These results are not only applicable to plant - pollinator networks but may also be extended to other types of ecological networks. In addition, the study also emphasizes the potential application value of machine - learning and artificial intelligence in ecology.

Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks

Machine learning and deep learning—A review for ecologists

A novel method for predicting ecological interactions with an unsupervised machine learning algorithm

Advancing ecological networks: moving beyond binary classification to probabilistic interactions

Trait matching without traits: using correspondence analysis to analyze the latent structure of interaction networks

Applications of machine learning in animal behaviour studies

Guidelines for the prediction of species interactions through binary classification

Trait‐matching models predict pairwise interactions across regions, not food web properties

Using metabolic networks to predict cross-feeding and competition interactions between microorganisms

Telling mutualistic and antagonistic ecological networks apart by learning their multiscale structure

A Review of Machine Learning Based Species' Distribution Modelling

Layer‐specific imprints of traits within a plant–herbivore–predator network – complementary insights from complementary methods

A machine learning approach to study plant functional trait divergence

Introduction to deep learning methods for multi‐species predictions

Using individual‐based trait frequency distributions to forecast plant‐pollinator network responses to environmental change

Inferring the Effect of Species Interactions on Trait Evolution

Joint representation of molecular networks from multiple species improves gene classification

Nine tips for ecologists using machine learning

Moving towards more holistic validation of machine learning-based approaches in ecology and evolution

Covariate-informed latent interaction models: Addressing geographic & taxonomic bias in predicting bird-plant interactions

Machine learning for modeling animal movement