Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers

Hajra Qayyum,Amjad Ali,Masood Ur Rehman
DOI: https://doi.org/10.1101/2024.06.08.598064
2024-06-10
Abstract:In recent years, the analysis of metagenomic data to recover unculturable microbes has revolutionized microbial genomics by rapidly expanding the reference genome catalog. Central to this, are the computational approaches of de novo assembly and genome binning that enable large-scale reference-independent recovery of microbial genomes from the metagenomic sequencing data. Despite the advancements in bioinformatics approaches to address the computational challenges inherent to these tasks, the limitation of computational resources continues to be a significant barrier to harvesting the full potential of these techniques. Consequently, there is a stressed need to devise strategies involving the fine-tuning of the employed parameters for the effective utilization of the available metagenomic tools. As most of the available metagenome assembly tools are based on the de Bruijn graph framework that relies on a parameter k, selecting an appropriate subset of k-mers has become a common approach in bioinformatics for efficient computations. In this study, we propose a reduced set of k-mers, optimized to strike a balance between computational efficiency and the quality of the high- and low-complexity metagenome assemblies. Utilizing this set of k-mers with MEGAHIT reduces the metagenome assembly time by half compared to the default set, thus greatly reducing the associated computational cost. In addition, it also brings the promise to improve large-scale genome binning studies that adopt this set in the future as we observed an increase in the total number of the recovered genomes as well as obtained higher proportions of high- and medium-quality genomes recovered from the reduced k-mers-based metagenome assemblies.
Bioinformatics
What problem does this paper attempt to address?