Probing the eukaryotic microbes of ruminants with a deep-learning classifier and comprehensive protein databases

Ming Yan,Thea Os Andersen,Phil B. Pope,Zhongtang Yu
DOI: https://doi.org/10.1101/2024.07.17.603995
2024-07-18
Abstract:Metagenomics, particularly genome-resolved metagenomics, has significantly deepened our understanding of microbes, illuminating their taxonomic and functional diversity and roles in ecology, physiology, and evolutions. However, eukaryotic populations within various microbiomes, including those in the mammalian gastrointestinal (GI) tract, remain relatively underexplored in metagenomic studies due to the lack of comprehensive reference genome databases and robust bioinformatics tools. The GI tract of ruminants, particularly the rumen, contains a high eukaryotic biomass although a relatively low diversity of ciliates and fungi, which significantly impact feed digestion, methane emissions, and rumen microbial ecology. In the present study, we developed GutEuk, a bioinformatics tool that improves upon the currently available Tiara and EukRep in accurately identifying metagenome eukaryotic sequences. GutEuk is optimized for high precision across different sequence lengths. It can also distinguish fungal and protozoal sequences, facilitating further elucidation of their unique ecological and physiological impacts. GutEuk was shown to facilitate a comprehensive analysis of protozoa and fungi within more than one thousand rumen metagenomes, revealing a greater genomic diversity among protozoa than previously documented. We further curated several ruminant eukaryotic protein databases, significantly enhancing our ability to distinguish the functional roles of ruminant fungi and protozoa from those of prokaryotes. Overall, the newly developed package GutEuk and its associated databases create new opportunities for in-depth study of GI tract eukaryotes.
Microbiology
What problem does this paper attempt to address?