Abstract:We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits. Microbial communities are incredibly diverse and play essential roles in ecosystems, from recycling nutrients to influencing climate change. We explored how the microbial community assembly can influence its species' metagenomics recovery. Specifically, we examined how the abundance of different species within a sample, the extent of DNA sequencing (sequencing depth), and the species taxonomic relatedness affect our ability to accurately reconstruct these communities. We computationally simulated three microbial communities, each composed of 42 species. These communities varied in species abundance, sequencing depth, and how closely related the species were to each other. We then applied three different computational techniques to reconstruct the original communities from the simulated sequence data. Our findings highlight the critical impact of sequencing depth and taxonomical relatedness, specifically, on accurately recovering microbial genomes. Interestingly, more sequencing does not always equate to more accurate community representation. Moreover, even a few false positives can significantly distort our interpretations of microbial diversity and function. Our research underscores the importance of carefully considering these factors in metagenomic studies to avoid misleading conclusions about microbial ecosystems. Our work contributes to refining metagenomic techniques, aiming for a more reliable and nuanced understanding of microbial life's role in our planet's health and functioning.

Metapresence: a tool for accurate species detection in metagenomics based on the genome-wide distribution of mapping reads

Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads

Evaluation of computational methods for human microbiome analysis using simulated data

PM-profiler: a high-resolution and fast tool for taxonomy annotation of amplicon-based microbiome

MetaMap: an Atlas of Metatranscriptomic Reads in Human Disease-Related RNA-seq Data

Large-scale estimation of bacterial and archaeal DNA prevalence in metagenomes reveals biome-specific patterns

Removal of false positives in metagenomics-based taxonomy profiling via targeting Type IIB restriction sites

Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4

A user's guide to the bioinformatic analysis of shotgun metagenomic sequence data for bacterial pathogen detection

KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

CAIM: Coverage-based Analysis for Identification of Microbiome

Moving Toward Metaproteogenomics: A Computational Perspective on Analyzing Microbial Samples via Proteogenomics

MetaScope - Fast and accurate identification of microbes in metagenomic sequencing data

Enhancing antimicrobial resistance detection with MetaGeneMiner: Targeted gene extraction from metagenomes

MEDUSA: A Pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences

[Microbial metaproteomics--From sample processing to data acquisition and analysis]

Nasal Septal Anatomy in Skeletally Mature Patients With Cleft Lip and Palate.

SingleM and Sandpiper: Robust microbial taxonomic profiles from metagenomic data

Meta-Apo improves accuracy of 16S-amplicon-based prediction of microbiome function

Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples

Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery