DeepMicrobeFinder Sorts Metagenomes into Prokaryotes, Eukaryotes and Viruses, with Marine Applications

Shengwei Hou,Siliangyu Cheng,Ting Chen,Jed A. Fuhrman,Fengzhu Sun
DOI: https://doi.org/10.21203/rs.3.rs-1016976/v1
2021-01-01
Abstract:Abstract Sequence classification is valuable for reducing the complexity of metagenomes and providing a fundamental understanding of the composition of metagenomic samples. Binary metagenomic classifiers offer an insufficient solution because metagenomes of most natural environments are typically derived from multiple sequence sources including prokaryotes, eukaryotes and the viruses of both. Here we introduce a deep-learning based (not reference-based) sequence classifier, DeepMicrobeFinder, that classifies metagenomic contigs into five sequence classes, e.g., viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. At different sequence lengths, DeepMicrobeFinder achieved area under the receiver operating characteristic curve (AUC) scores >0.9 for most sequence classes, the exception being distinguishing prokaryotic chromosomes from plasmids. By benchmarking on 20 test datasets with variable sequence class composition, we showed that DeepMicrobeFinder obtained average accuracy scores of ~0.94, ~0.87, and ~0.92 for eukaryotic, plasmid and viral contig classification respectively, which were significantly higher than the other state-of-the-art individual predictors. Using a 1-300 µm daily time-series metagenomic dataset sampled from coastal Southern California as a case study, we showed that metagenomic read proportions recruited by eukaryotic contigs could be doubled with DeepMicrobeFinder’s classification compared to the counterparts of other reference-based classifiers. In addition, a positive correlation could be observed between eukaryotic read proportions and potential prokaryotic community growth rates, suggesting an enrichment of fast-growing copiotrophs with increased eukaryotic particles. With its inclusive modeling and unprecedented performance, we expect DeepMicrobeFinder will promote metagenomic studies of under-appreciated sequence types.
What problem does this paper attempt to address?