High-quality metagenome assembly from long accurate reads with metaMDBG

Gaëtan Benoit,Sébastien Raguideau,Robert James,Adam M. Phillippy,Rayan Chikhi,Christopher Quince
DOI: https://doi.org/10.1038/s41587-023-01983-6
IF: 46.9
2024-01-02
Nature Biotechnology
Abstract:Abstract We introduce metaMDBG, a metagenomics assembler for PacBio HiFi reads. MetaMDBG combines a de Bruijn graph assembly in a minimizer space with an iterative assembly over sequences of minimizers to address variations in genome coverage depth and an abundance-based filtering strategy to simplify strain complexity. For complex communities, we obtained up to twice as many high-quality circularized prokaryotic metagenome-assembled genomes as existing methods and had better recovery of viruses and plasmids.
biotechnology & applied microbiology
What problem does this paper attempt to address?
The paper aims to address several key issues in metagenome assembly, particularly when dealing with long and accurate sequencing reads. Specifically, the research team developed a new algorithm called metaMDBG for assembling PacBio HiFi reads. This method combines a de Bruijn graph assembly strategy based on minimizer space with an iterative assembly process to tackle the challenges of varying coverage depth and strain complexity in metagenomes. The main objectives of the study include: 1. **Improving the recovery rate of complete circular genomes**: Compared to existing methods, metaMDBG can assemble more high-quality circular prokaryotic metagenome-assembled genomes (MAGs) from complex microbial communities. 2. **Enhancing the recovery rate of viruses and plasmids**: metaMDBG also excels in the assembly of viruses and plasmids, identifying more circular plasmids and phages than other methods. 3. **Addressing issues with low-abundance and high-abundance strains**: metaMDBG can effectively assemble both low-abundance and high-abundance strains, thereby improving the overall assembly quality. 4. **Increasing processing efficiency**: metaMDBG significantly outperforms existing assembly tools in terms of execution time and memory consumption, making it more suitable for handling large-scale datasets. In summary, the goal of the paper is to improve the efficiency and accuracy of metagenome assembly while ensuring high-quality assemblies, especially when dealing with complex microbial communities.