DeepMicroClass sorts metagenomes into prokaryotes, eukaryotes and viruses, with marine applications

Shengwei Hou,Tianqi Tang,Siliangyu Cheng,Ting Chen,Jed A. Fuhrman,Fengzhu Sun
DOI: https://doi.org/10.1101/2021.10.26.466018
2023-01-01
Abstract:Sequence classification reduces the complexity of metagenomes and facilitates a fundamental understanding of the structure and function of microbial communities. Binary metagenomic classifiers offer an insufficient solution because environmental metagenomes are typically derived from multiple sequence sources, including prokaryotes, eukaryotes and the viruses of both. Here we introduce a deep-learning based (as opposed to alignment-based) sequence classifier, DeepMicroClass, that classifies metagenomic contigs into five sequence classes, i.e., viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. At different sequence lengths, DeepMicroClass achieved area under the receiver operating characteristic curve (AUC) scores >0.98 for most sequence classes, with the exception of distinguishing plasmids from prokaryotic chromosomes (AUC scores ≈ 0.97). By benchmarking on 20 designed datasets with variable sequence class composition, we showed that DeepMicroClass obtained average accuracy scores of ∼0.99, ∼0.97, and ∼0.99 for eukaryotic, plasmid and viral contig classification, respectively, which were significantly higher than the other state-of-the-art individual predictors. Using a 1-300 µm daily time-series metagenomic dataset sampled from coastal Southern California as a case study, we showed that metagenomic read proportions recruited by eukaryotic contigs could be doubled with DeepMicroClass’s classification compared to the counterparts of other alignment-based classifiers. With its inclusive modeling and unprecedented performance, we expect DeepMicroClass will be a useful addition to the toolbox of microbial ecologists, and will promote metagenomic studies of under-appreciated sequence types. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?