Machine Learning Approaches for Predicting Virus-Human Protein-Protein Interactions: An Evaluation of Retroviral Interaction Networks

Omid Mahmoudi,Somayye Taghvaei,Shirin Salehi,Soheil Khosravi,Alireza Sazgar,Sara Zareei
DOI: https://doi.org/10.1101/2024.11.13.623326
2024-11-15
Abstract:Virus-human protein-protein interactions (VHPPI) are key to understanding how viruses manipulate host cellular functions. This study constructed a retroviral-human PPI network by integrating multiple public databases, resulting in 1,387 interactions between 29 retroviral and 1,026 human genes. Using minimal sequence similarity, we generated a pseudo-negative dataset for model reliability. Five machine learning models,Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), and Random Forest (RF), were evaluated using accuracy, sensitivity, specificity, PPV, and NPV. LR and KNN models demonstrated the strongest predictive performance, with sensitivities up to 77% and specificities of 52%. Feature importance analysis identified GC content and semantic similarity as influential predictors. Models trained on selected features showed enhanced accuracy with reduced complexity. Our approach highlights the potential of computational models for VHPPI predictions, offering valuable insights into viral-host interaction networks and guiding therapeutic target identification.
Bioinformatics
What problem does this paper attempt to address?