DeepEBV: a deep learning model to predict Epstein–Barr virus (EBV) integration sites
Jiuxing Liang,Zifeng Cui,Canbiao Wu,Yao Yu,Rui Tian,Hongxian Xie,Zhuang Jin,Weiwen Fan,Weiling Xie,Zhaoyue Huang,Wei Xu,Jingjing Zhu,Zeshan You,Xiaofang Guo,Xiaofan Qiu,Jiahao Ye,Bin Lang,Mengyuan Li,Songwei Tan,Zheng Hu
DOI: https://doi.org/10.1093/bioinformatics/btab388
IF: 5.8
2021-05-19
Bioinformatics
Abstract:Abstract Motivation Epstein–Barr virus (EBV) is one of the most prevalent DNA oncogenic viruses. The integration of EBV into the host genome has been reported to play an important role in cancer development. The preference of EBV integration showed strong dependence on the local genomic environment, which enables the prediction of EBV integration sites. Results An attention-based deep learning model, DeepEBV, was developed to predict EBV integration sites by learning local genomic features automatically. First, DeepEBV was trained and tested using the data from the dsVIS database. The results showed that DeepEBV with EBV integration sequences plus Repeat peaks and 2-fold data augmentation performed the best on the training dataset. Furthermore, the performance of the model was validated in an independent dataset. In addition, the motifs of DNA-binding proteins could influence the selection preference of viral insertional mutagenesis. Furthermore, the results showed that DeepEBV can predict EBV integration hotspot genes accurately. In summary, DeepEBV is a robust, accurate and explainable deep learning model, providing novel insights into EBV integration preferences and mechanisms. Availabilityand implementation DeepEBV is available as open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepEBV.git. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology