Eukfinder: a pipeline to retrieve microbial eukaryote genomes from metagenomic sequencing data

Dandan Zhao,Dayana E. Salas-Leiva,Shelby K. Williams,Katherine A. Dunn,Andrew J. Roger
DOI: https://doi.org/10.1101/2023.12.28.573569
2024-02-11
Abstract:Whole-genome shotgun (WGS) metagenomic sequencing of microbial communities allows us to discover the functions, physiologies, and evolutionary histories of microbial prokaryote and eukaryote members of diverse ecosystems. Despite their importance, metagenomic studies of microbial eukaryotes lag behind those of prokaryotes, due to the difficulty in identifying and assembling high-quality eukaryotic genomes from WGS data. To address this problem, we have developed Eukfinder, a bioinformatics pipeline that recovers and assembles nuclear and mitochondrial genomes of eukaryotic microbes from WGS metagenomics data. As part of its workflow, it utilizes two specialized databases to classify reads based on taxonomy which can be customized to the dataset or environment of interest. We applied Eukfinder to human gut microbiome WGS metagenomic sequencing data to recover genomes from the protistan parasite sp., a highly prevalent colonizer of the gastrointestinal tract of humans and animals. We tested Eukfinder using both a series of simulated gut microbiome datasets, which included varying numbers of reads combined with bacterial reads and by using real metagenomic gut samples containing We compared the results of Eukfinder with other published workflows. With sufficient reads, Eukfinder efficiently assembles high-quality near-complete nuclear and mitochondrial genomes from diverse subtypes from metagenomic data without the aid of a reference genome. Furthermore, with sufficient depth of sequence sampling, Eukfinder outperforms similar tools used to recover eukaryotic genomes from metagenomic data. Eukfinder will be a useful tool for reference-independent and cultivation-free study of eukaryotic microbial genomes from environmental metagenomic sequencing samples.
Bioinformatics
What problem does this paper attempt to address?