CBIL-VHPLI: a model for predicting viral-host protein-lncRNA interactions based on machine learning and transfer learning

Man Zhang,Li Zhang,Ting Liu,Huawei Feng,Zhe He,Feng Li,Jian Zhao,Hongsheng Liu
DOI: https://doi.org/10.1038/s41598-024-68750-8
2024-07-30
Abstract:Virus‒host protein‒lncRNA interaction (VHPLI) predictions are critical for decoding the molecular mechanisms of viral pathogens and host immune processes. Although VHPLI interactions have been predicted in both plants and animals, they have not been extensively studied in viruses. For the first time, we propose a new deep learning-based approach that consists mainly of a convolutional neural network and bidirectional long and short-term memory network modules in combination with transfer learning named CBIL‒VHPLI to predict viral-host protein‒lncRNA interactions. The models were first trained on large and diverse datasets (including plants, animals, etc.). Protein sequence features were extracted using a k-mer method combined with the one-hot encoding and composition-transition-distribution (CTD) methods, and lncRNA sequence features were extracted using a k-mer method combined with the one-hot encoding and Z curve methods. The results obtained on three independent external validation datasets showed that the pre-trained CBIL‒VHPLI model performed the best with an accuracy of approximately 0.9. Pretraining was followed by conducting transfer learning on a viral protein-human lncRNA dataset, and the fine-tuning results showed that the accuracy of CBIL‒VHPLI was 0.946, which was significantly greater than that of the previous models. The final case study results showed that CBIL‒VHPLI achieved a prediction reproducibility rate of 91.6% for the RIP-Seq experimental screening results. This model was then used to predict the interactions between human lncRNA PIK3CD-AS2 and the nonstructural protein 1 (NS1) of the H5N1 virus, and RNA pull-down experiments were used to prove the prediction readiness of the model in terms of prediction. The source code of CBIL‒VHPLI and the datasets used in this work are available at https://github.com/Liu-Lab-Lnu/CBIL-VHPLI for academic usage.
What problem does this paper attempt to address?