Abstract:Routes of virus transmission between hosts are key to understanding viral epidemiology. Different routes have large effects on viral ecology, and likelihood and rate of transmission; for example, respiratory and vector-borne viruses together encompass the majority of rapid outbreaks and high-consequence animal and plant epidemics. However, determining the specific transmission route(s) can take months to years, delaying mitigation efforts. Here, we identify the vial features and evolutionary signatures which are predictive of viral transmission routes and use them to predict potential routes for fully-sequenced viruses in silico and rapidly, for both viruses with no observed routes, as well as viruses with missing routes. This was achieved by compiling a dataset of 24,953 virus-host associations with 81 defined transmission routes, constructing a hierarchy of virus transmission encompassing those routes and 42 higher-order modes, and engineering 446 predictive features from three complementary perspectives. We integrated those data and features to train 98 independent ensembles of LightGBM classifiers. We found that all features contributed to the prediction for at least one of the routes and/or modes of transmission, demonstrating the utility of our broad multi-perspective approach. Our framework achieved ROC-AUC = 0.991, and F1-score = 0.855 across all included transmission routes and modes, and was able to achieve high levels of predictive performance for high-consequence respiratory (ROC-AUC = 0.990, and F1-score = 0.864) and vector-borne transmission (ROC-AUC = 0.997, and F1-score = 0.921). Our framework ranks the viral features in order of their contribution to prediction, per transmission route, and hence identifies the genomic evolutionary signatures associated with each route. Together with the more matured field of viral host-range prediction, our predictive framework could: provide early insights into the potential for, and pattern of viral spread; facilitate rapid response with appropriate measures; and significantly triage the time-consuming investigations to confirm the likely routes of transmission. Routes of virus transmission–the mechanism(s) by which a virus physically gets from an infected to an uninfected host, are crucial to understanding how viral diseases spread among animals and plants. Here, we uncover the evolutionary signatures which can predict the transmission routes a virus uses to move from one host to another, enabling us to identify any unobserved routes for known viruses and even predict potential routes of newly emerged viruses. We first compile a comprehensive dataset of virus-host associations. Using this dataset, we employ a multi-perspective machine learning approach to achieve high predictive performance. Our framework ranks viral features by their significance in prediction, revealing genomic evolutionary signatures linked to each route. Our approach could provide early insights into viral spread patterns, facilitating prompt response efforts to new outbreaks and epidemics, and streamline laboratory investigations. Overall, our study represents a step forward in our ability to anticipate and mitigate the impact of emerging infectious diseases on human, animal, and plant health.

Prediction of virus-host infectious association by supervised learning methods

Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning

Virus-host interactions predictor (VHIP): Machine learning approach to resolve microbial virus-host interaction networks

Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics

Prediction of mammalian virus cross-species transmission based on host proteins

Prediction of virus-host associations using protein language models and multiple instance learning

Synthesis of azasugars as potent inhibitors of glycosidases.

Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments

DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes

Reservoir hosts prediction for COVID-19 by hybrid transfer learning model

Virus2Vec: Viral Sequence Classification Using Machine Learning

Predicting hosts and cross-species transmission of Streptococcus agalactiae by interpretable machine learning

Coronary artery disease. I. The clinical syndromes.

VirusPredictor: XGBoost-based software to predict virus-related sequences in human data

ViRNN: A Deep Learning Model for Viral Host Prediction

Dive into Machine Learning Algorithms for Influenza Virus Host Prediction with Hemagglutinin Sequences

[Human stem cells in the treatment of pancreatic and hepatic diseases].

Predicting host species susceptibility to influenza viruses and coronaviruses using genome data and machine learning: a scoping review

Biomarkers for prognostication after acute coronary syndromes: new times and statistics.

Identifying viruses from metagenomic data by deep learning

Features that matter: Evolutionary signatures can predict viral transmission routes