IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning

Hengchuang Yin,Shufang Wu,Jie Tan,Qian Guo,Mo Li,Jinyuan Guo,Yaqi Wang,Xiaoqing Jiang,Huaiqiu Zhu
DOI: https://doi.org/10.1093/gigascience/giae018
IF: 7.658
2024-04-23
GigaScience
Abstract:Background The virome obtained through virus-like particle enrichment contains a mixture of prokaryotic and eukaryotic virus-derived fragments. Accurate identification and classification of these elements are crucial to understanding their roles and functions in microbial communities. However, the rapid mutation rates of viral genomes pose challenges in developing high-performance tools for classification, potentially limiting downstream analyses. Findings We present IPEV, a novel method to distinguish prokaryotic and eukaryotic viruses in viromes, with a 2-dimensional convolutional neural network combining trinucleotide pair relative distance and frequency. Cross-validation assessments of IPEV demonstrate its state-of-the-art precision, significantly improving the F1-score by approximately 22% on an independent test set compared to existing methods when query viruses share less than 30% sequence similarity with known viruses. Furthermore, IPEV outperforms other methods in accuracy on marine and gut virome samples based on annotations by sequence alignments. IPEV reduces runtime by at most 1,225 times compared to existing methods under the same computing configuration. We also utilized IPEV to analyze longitudinal samples and found that the gut virome exhibits a higher degree of temporal stability than previously observed in persistent personal viromes, providing novel insights into the resilience of the gut virome in individuals. Conclusions IPEV is a high-performance, user-friendly tool that assists biologists in identifying and classifying prokaryotic and eukaryotic viruses within viromes. The tool is available at https://github.com/basehc/IPEV.
multidisciplinary sciences
What problem does this paper attempt to address?