Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet

Nicolae Sapoval,Yunxi Liu,Kristen Curry,Bryce Kille,Wenyu Huang,Natalie Kokroko,Michael G Nute,Alona Tyshaieva,Alexander T Dilthey,Erin Molloy,Todd J Treangen
DOI: https://doi.org/10.1101/2024.06.01.596961
2024-08-25
Abstract:The advent of long-read sequencing of microbiomes necessitates the development of new taxonomic profilers tailored to long-read shotgun metagenomic datasets. Here, we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling for long-read shotgun metagenomic datasets. Lemur is a marker-gene-based method that leverages an EM algorithm to reduce false positive calls while preserving true positives; Magnet is a whole-genome read mapping based method that provides detailed presence and absence calls for bacterial genomes. We demonstrate that Lemur and Magnet can run in minutes to hours on a laptop with 32 GB of RAM, even for large inputs, a crucial feature given the portability of long-read sequencing machines. Furthermore, the marker gene database used by Lemur is only 4 GB and contains information from over 300,000 RefSeq genomes. Lemur and Magnet are open-source and available at https://github.com/treangenlab/lemur and https://github.com/treangenlab/magnet.
Bioinformatics
What problem does this paper attempt to address?