gFlora: a topology-aware method to discover functional co-response groups in soil microbial communities

Nan Chen,Merlijn Schram,Doina Bucur
2024-07-17
Abstract:We aim to learn the functional co-response group: a group of taxa whose co-response effect (the representative characteristic of the group showing the total topological abundance of taxa) co-responds (associates well statistically) to a functional variable. Different from the state-of-the-art method, we model the soil microbial community as an ecological co-occurrence network with the taxa as nodes (weighted by their abundance) and their relationships (a combination from both spatial and functional ecological aspects) as edges (weighted by the strength of the relationships). Then, we design a method called gFlora which notably uses graph convolution over this co-occurrence network to get the co-response effect of the group, such that the network topology is also considered in the discovery process. We evaluate gFlora on two real-world soil microbiome datasets (bacteria and nematodes) and compare it with the state-of-the-art method. gFlora outperforms this on all evaluation metrics, and discovers new functional evidence for taxa which were so far under-studied. We show that the graph convolution step is crucial to taxa with relatively low abundance (thus removing the bias towards taxa with higher abundance), and the discovered bacteria of different genera are distributed in the co-occurrence network but still tightly connected among themselves, demonstrating that topologically they fill different but collaborative functional roles in the ecological community.
Machine Learning
What problem does this paper attempt to address?
This paper proposes a new approach called gFlora for discovering functional co-occurring modules in soil microbial communities. The current problem is that it is difficult to understand which microbial species control specific functions of the soil due to the complexity and vastness of soil ecosystems. gFlora addresses this issue by considering the network topology of the microbial community, modeling it as an ecological co-occurrence network where nodes represent microbial species (weighted by abundance) and edges represent relationships between species (incorporating spatial and functional ecological aspects). Then, gFlora utilizes graph convolution on this network to obtain collective response effects of the community, taking into account the network topology in the discovery process. Compared to existing methods, gFlora performs better on two real soil microbiome datasets (bacteria and nematodes) and discovers new functional evidence for previously understudied microbial species. This approach highlights the importance of graph convolution for low abundance species, eliminates biases towards high abundance species, and suggests that bacteria from different genera are distributed but still tightly connected in the co-occurrence network, collectively undertaking different yet collaborative functional roles in the ecological community. In summary, gFlora provides a simpler and more interpretable soil biological functional model, aiding scientists in better understanding and managing soil health and its functions.