SingleM and Sandpiper: Robust microbial taxonomic profiles from metagenomic data

Ben J. Woodcroft,Samuel T. N. Aroney,Rossen Zhao,Mitchell Cunningham,Joshua A. M. Mitchell,Linda Blackall,Gene W. Tyson
DOI: https://doi.org/10.1101/2024.01.30.578060
2024-01-31
Abstract:Determining the taxonomy and relative abundance of microorganisms in metagenomic data is a foundational problem in microbial ecology. To address the limitations of existing approaches, we developed ‘SingleM’, which estimates community composition using conserved regions within universal marker genes. SingleM accurately profiles complex communities of known microbial species, and is the only tool that detects species without genomic representation, even those representing novel phyla. Given SingleM’s computational efficiency, we applied it to 248,559 publicly available metagenomes and show that the vast majority of samples from marine, freshwater, sediment and soil environments are dominated by novel species lacking genomic representation (median relative abundance 75.0%). SingleM also provides a way to identify metagenomes for the recovery of novel metagenome-assembled genomes from lineages of interest, and can incorporate user-recovered genomes into its reference database to improve profiling resolution. Quantifying the full diversity of Bacteria and Archaea in metagenomic data shows that microbial genome databases are far from saturated.
Microbiology
What problem does this paper attempt to address?