Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery
Ulisses Rocha,Jonas Coelho Kasmanas,Rodolfo Toscan,Danilo S. Sanches,Stefania Magnusdottir,Joao Pedro Saraiva
DOI: https://doi.org/10.1371/journal.pcbi.1012530
2024-10-24
PLoS Computational Biology
Abstract:We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits. Microbial communities are incredibly diverse and play essential roles in ecosystems, from recycling nutrients to influencing climate change. We explored how the microbial community assembly can influence its species' metagenomics recovery. Specifically, we examined how the abundance of different species within a sample, the extent of DNA sequencing (sequencing depth), and the species taxonomic relatedness affect our ability to accurately reconstruct these communities. We computationally simulated three microbial communities, each composed of 42 species. These communities varied in species abundance, sequencing depth, and how closely related the species were to each other. We then applied three different computational techniques to reconstruct the original communities from the simulated sequence data. Our findings highlight the critical impact of sequencing depth and taxonomical relatedness, specifically, on accurately recovering microbial genomes. Interestingly, more sequencing does not always equate to more accurate community representation. Moreover, even a few false positives can significantly distort our interpretations of microbial diversity and function. Our research underscores the importance of carefully considering these factors in metagenomic studies to avoid misleading conclusions about microbial ecosystems. Our work contributes to refining metagenomic techniques, aiming for a more reliable and nuanced understanding of microbial life's role in our planet's health and functioning.
biochemical research methods,mathematical & computational biology