Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn 4
N. Segata,Sarah E. Berry,Francesca Giordano,F. Asnicar,F. Cumbo,L. McIver,J. Wolf,Curtis Huttenhower,Kelsey N. Thompson,Kun D. Huang,R. Davies,Eric A. Franzosa,A. Blanco-Míguez,M. Valles-Colomer,Edoardo Pasolli,Léonard Dubois,Moreno Zolfo,F. Beghini,P. Manghi,Timothy D. Spector,A. M. Thomas,Adrian Tett,G. Piccinno,Elisa Piperni,Michal Punčochář
DOI: https://doi.org/10.1101/2022.08.22.504593
2022-08-22
bioRxiv
Abstract:Metagenomic assembly enables novel organism discovery from microbial communities, but from most metagenomes it can only capture few abundant organisms. Here, we present a method - MetaPhlAn 4 - to integrate information from both metagenome assemblies and microbial isolate genomes for improved and more comprehensive metagenomic taxonomic profiling. From a curated collection of 1.01M prokaryotic reference and metagenome-assembled genomes, we defined unique marker genes for 26,970 species-level genome bins, 4,992 of them taxonomically unidentified at the species level. MetaPhlAn 4 explains ∼20% more reads in most international human gut microbiomes and >40% in less-characterized environments such as the rumen microbiome, and proved more accurate than available alternatives on synthetic evaluations while also reliably quantifying organisms with no cultured isolates. Application of the method to >24,500 metagenomes highlighted previously undetected species to be strong biomarkers for host conditions and lifestyles in human and mice microbiomes, and showed that even previously uncharacterized species can be genetically profiled at the resolution of single microbial strains. MetaPhlAn 4 thus integrates the novelty of metagenomic assemblies with the sensitivity and fidelity of reference-based analyses, providing efficient metagenomic profiling of uncharacterized species and enabling deeper and more comprehensive microbiome biomarker detection.
Biology,Computer Science,Environmental Science