Abstract:Viruses of microbes are ubiquitous biological entities that reprogram their hosts' metabolisms during infection in order to produce viral progeny, impacting the ecology and evolution of microbiomes with broad implications for human and environmental health. Advances in genome sequencing have led to the discovery of millions of novel viruses and an appreciation for the great diversity of viruses on Earth. Yet, with knowledge of only "who is there ? " we fall short in our ability to infer the impacts of viruses on microbes at population, community, and ecosystem-scales. To do this, we need a more explicit understanding "who do they infect ? " Here, we developed a novel machine learning model (ML), Virus-Host Interaction Predictor (VHIP), to predict virus-host interactions (infection/non-infection) from input virus and host genomes. This ML model was trained and tested on a high-value manually curated set of 8849 virus-host pairs and their corresponding sequence data. The resulting dataset, 'Virus Host Range network' (VHRnet), is core to VHIP functionality. Each data point that underlies the VHIP training and testing represents a lab-tested virus-host pair in VHRnet, from which meaningful signals of viral adaptation to host were computed from genomic sequences. VHIP departs from existing virus-host prediction models in its ability to predict multiple interactions rather than predicting a single most likely host or host clade. As a result, VHIP is able to infer the complexity of virus-host networks in natural systems. VHIP has an 87.8% accuracy rate at predicting interactions between virus-host pairs at the species level and can be applied to novel viral and host population genomes reconstructed from metagenomic datasets. The ecology and evolution of microbial communities are deeply influenced by viruses. Metagenomics analysis, the non-targeted sequencing of community genomes, has led to the discovery of millions of novel viruses. Yet, through the sequencing process, only DNA sequences are recovered, begging the question: which microbial hosts do those novel viruses infect? To address this question, we developed a computational tool to allow researchers to predict virus-host interactions from such sequence data. The power of this tool is its use of a high-value, manually curated set of 8849 lab-verified virus-host pairs and their corresponding sequence data. For each pair, we computed signals of coevolution to use as the predictive features in a machine learning model designed to predict interactions between viruses and hosts. The resulting model, Virus-Host Interaction Predictor (VHIP), has an accuracy of 87.8% and can be applied to novel viral and host genomes reconstructed from metagenomic datasets. Because the model considers all possible virus-host pairs, it can resolve complete virus-host interaction networks and supports a new avenue to apply network thinking to viral ecology.

CBIL-VHPLI: a model for predicting viral-host protein-lncRNA interactions based on machine learning and transfer learning

Prediction and Analysis of Human-Herpes Simplex Virus Type 1 Protein-Protein Interactions by Integrating Multiple Methods

A Hybrid Prediction Method for Plant lncRNA-Protein Interaction.

A multitask transfer learning framework for the prediction of virus-human protein-protein interactions

Deep Learning-Powered Prediction of Human-Virus Protein-Protein Interactions

ECA-PHV: Predicting Human-Virus Protein-Protein Interactions Through an Interpretable Model of Effective Channel Attention Mechanism

HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy

DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes

Machine Learning Approaches for Predicting Virus-Human Protein-Protein Interactions: An Evaluation of Retroviral Interaction Networks

LPIH2V: LncRNA-protein interactions prediction using HIN2Vec based on heterogeneous networks model

Prediction of virus-host associations using protein language models and multiple instance learning

Prediction of lncRNA–Protein Interactions via the Multiple Information Integration

Virus-host interactions predictor (VHIP): Machine learning approach to resolve microbial virus-host interaction networks

DeepLPI: a multimodal deep learning method for predicting the interactions between lncRNAs and protein isoforms

EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

HGNNPIP: A Hybrid Graph Neural Network framework for Protein-protein Interaction Prediction

DPCIPI: A pre-trained deep learning model for predicting cross-immunity between drifted strains of Influenza A/H3N2

ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA

BJLD-CMI: a predictive circRNA-miRNA interactions model combining multi-angle feature information

LNRLMI: Linear neighbour representation for predicting lncRNA‐miRNA interactions

Multi-modal features-based human-herpesvirus protein–protein interaction prediction by using LightGBM