Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks

Maximilian Pichler,Virginie Boreux,Alexandra-Maria Klein,Matthias Schleuning,Florian Hartig
DOI: https://doi.org/10.1111/2041-210X.13329
2019-11-05
Abstract:Ecologists have long suspected that species are more likely to interact if their traits match in a particular way. For example, a pollination interaction may be more likely if the proportions of a bee's tongue fit a plant's flower shape. Empirical estimates of the importance of trait-matching for determining species interactions, however, vary significantly among different types of ecological networks. Here, we show that ambiguity among empirical trait-matching studies may have arisen at least in parts from using overly simple statistical models. Using simulated and real data, we contrast conventional generalized linear models (GLM) with more flexible Machine Learning (ML) models (Random Forest, Boosted Regression Trees, Deep Neural Networks, Convolutional Neural Networks, Support Vector Machines, naive Bayes, and k-Nearest-Neighbor), testing their ability to predict species interactions based on traits, and infer trait combinations causally responsible for species interactions. We find that the best ML models can successfully predict species interactions in plant-pollinator networks, outperforming GLMs by a substantial margin. Our results also demonstrate that ML models can better identify the causally responsible trait-matching combinations than GLMs. In two case studies, the best ML models successfully predicted species interactions in a global plant-pollinator database and inferred ecologically plausible trait-matching rules for a plant-hummingbird network, without any prior assumptions. We conclude that flexible ML models offer many advantages over traditional regression models for understanding interaction networks. We anticipate that these results extrapolate to other ecological network types. More generally, our results highlight the potential of machine learning and artificial intelligence for inference in ecology, beyond standard tasks such as image or pattern recognition.
Populations and Evolution,Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to use machine - learning algorithms to predict species interactions more accurately and infer the feature - matching combinations that lead to these interactions**. Specifically, the author focuses on whether species interactions in ecological networks, especially in plant - pollinator networks, can be predicted by their feature matching. Traditional statistical models (such as the Generalized Linear Model, GLM) have limitations when dealing with such problems. Therefore, the author introduced multiple machine - learning models (such as Random Forest, Boosted Regression Trees, Deep Neural Networks, etc.) to improve the prediction accuracy and interpretability. ### Background and Problems of the Paper 1. **Importance of Species Interactions** - Ecologists have long believed that if certain features of species match properly, they are more likely to interact. For example, when the length of a bee's tongue matches the shape of a flower, the probability of pollination is greater. - However, in different types of ecological networks, the importance of feature matching for species interactions varies significantly. 2. **Limitations of Existing Research** - Most previous studies used simple statistical models (such as GLM), which may be too simplistic to capture complex feature interactions. - Machine - learning models (such as Random Forest, Deep Neural Networks, etc.) have higher flexibility and prediction performance, but their interpretability is poor. 3. **Research Purposes** - **Evaluate the Prediction Performance of Different Machine - Learning Models**: Compare the performance of different machine - learning models in predicting species interactions in plant - pollinator networks through simulated and real data. - **Infer Causal Feature Combinations**: Use methods such as the H - statistic to extract the causal feature combinations that lead to species interactions from the fitted machine - learning models. ### Methods and Experimental Design 1. **Data and Models** - Use simulated and real data for experiments. - Consider factors such as species abundance distribution and observation time during the generation of simulated data. - Select multiple machine - learning models (k - Nearest Neighbors, Random Forest, Boosted Regression Trees, Deep Neural Networks, Support Vector Machines, Naive Bayes, Convolutional Neural Networks) and the traditional Generalized Linear Model as a benchmark. 2. **Performance Evaluation** - Use indicators such as AUC (Area Under the Receiver Operating Characteristic Curve), TSS (True Skill Statistic), and Spearman's correlation coefficient to evaluate the prediction performance of the models. - For real data, also calculate classification - threshold - dependent performance indicators such as accuracy, sensitivity, precision, and specificity. 3. **Inference of Causal Feature Combinations** - Use the H - statistic to estimate the interaction strength between features, thereby inferring the causal feature combinations. ### Experimental Results 1. **Prediction Performance** - In the absence of feature matching, the prediction performance of all models is close to the random level (AUC ≈ 0.5, TSS ≈ 0). - In the case of uneven species abundance, the prediction performance of the models has improved, but it is still limited. - In the case of feature matching, the prediction performance of machine - learning models (especially Random Forest, Deep Neural Networks, etc.) is significantly better than that of the traditional Generalized Linear Model. 2. **Inference of Causal Feature Combinations** - Through the H - statistic, machine - learning models can successfully identify the causal feature combinations that lead to species interactions. - In two case studies, machine - learning models successfully predicted species interactions in the global plant - pollinator database and inferred reasonable feature - matching rules in the Costa Rican plant - hummingbird network. ### Conclusion The author concludes that flexible machine - learning models have significant advantages in understanding species interactions in ecological networks. They not only improve prediction performance but also can better identify causal feature combinations. These results are not only applicable to plant - pollinator networks but may also be extended to other types of ecological networks. In addition, the study also emphasizes the potential application value of machine - learning and artificial intelligence in ecology.