Abstract:We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits. Microbial communities are incredibly diverse and play essential roles in ecosystems, from recycling nutrients to influencing climate change. We explored how the microbial community assembly can influence its species' metagenomics recovery. Specifically, we examined how the abundance of different species within a sample, the extent of DNA sequencing (sequencing depth), and the species taxonomic relatedness affect our ability to accurately reconstruct these communities. We computationally simulated three microbial communities, each composed of 42 species. These communities varied in species abundance, sequencing depth, and how closely related the species were to each other. We then applied three different computational techniques to reconstruct the original communities from the simulated sequence data. Our findings highlight the critical impact of sequencing depth and taxonomical relatedness, specifically, on accurately recovering microbial genomes. Interestingly, more sequencing does not always equate to more accurate community representation. Moreover, even a few false positives can significantly distort our interpretations of microbial diversity and function. Our research underscores the importance of carefully considering these factors in metagenomic studies to avoid misleading conclusions about microbial ecosystems. Our work contributes to refining metagenomic techniques, aiming for a more reliable and nuanced understanding of microbial life's role in our planet's health and functioning.

Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads

Metapresence: a tool for accurate species detection in metagenomics based on the genome-wide distribution of mapping reads

Large-scale estimation of bacterial and archaeal DNA prevalence in metagenomes reveals biome-specific patterns

Evaluation of computational methods for human microbiome analysis using simulated data

A probabilistic analysis of shotgun sequencing for metagenomics

Assessment of Metagenomic Assemblers Based on Hybrid Reads of Real and Simulated Metagenomic Sequences

A survey on computational strategies for genome-resolved gut metagenomics

Large-scale 16S Gene Assembly Using Metagenomics Shotgun Sequences

Metagenome profiling and containment estimation through abundance-corrected k-mer sketching with sylph

Simple, reference-independent analyses help optimize hybrid assembly of microbial community metagenomes

Multi-sample Estimation of Bacterial Composition Matrix in Metagenomics Data

Towards complete representation of bacterial contents in metagenomic samples

Highly accurate metagenome-assembled genomes from human gut microbiota using long-read assembly, binning, and consolidation methods

Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects

Abundance profiling of specific gene groups using precomputed gut metagenomes yields novel biological hypotheses

Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples

Estimating the total genome length of a metagenomic sample using k-mers

Meta-Apo improves accuracy of 16S-amplicon-based prediction of microbiome function

Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery

The impact of sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples

Do you cov me? Effect of coverage reduction on metagenome shotgun sequencing studies