CoMPHI: A Novel Composite Machine Learning Approach Utilizing Multiple Feature Representation to Predict Hosts of Bacteriophages
Shreyashi Bodaka,Onkar Malgonde
DOI: https://doi.org/10.1101/2024.07.29.604684
2024-08-02
Abstract:Phage therapy has reemerged as a compelling alternative to antibiotics in treating bacterial infections, especially for superbugs that have developed antibiotic resistance. The challenge in the broader application of phage therapy is identifying host targets for the vast array of uncharacterized phages obtained through next-generation sequencing. To solve this issue, this paper introduces an innovative Composite Model for Phage Host Interaction, CoMPHI, to predict phage-host interactions by combining the accuracy of alignment-based methods with the efficiency and flexibility of machine learning techniques. The model initially generates multiple feature encodings from nucleotide and protein sequences of both phages and hosts to enhance prediction accuracies. It is further enriched by incorporating alignment scores between phagephage, phage-host, and host-host, creating a composite model. During the 5-fold crossvalidation, the composite model exhibited an Area Under the ROC Curve (AUC) of 94%, 96.4%, 96.5%, 96.6%, 96.6%, and 96.7% and accuracy of 92.3%, 93.3%, 93.6%, 94%, 94.9%, and 95.1% at the Species, Genus, Family, Order, Class, and Phylum levels, respectively. A comparative analysis revealed a 6-8% increase in model performance due to the inclusion of alignment scores. Additionally, an ablation study highlighted that including both nucleotide and protein sequences from both phages and hosts increased the prediction accuracy of the model. Another ablation study provided evidence that phage-host and host-host alignment scores, combined with phage-phage scores, equally contributed to enhancing the composite model's performance. In conclusion, this paper presents a robust and comprehensive composite model advancing the use of phage therapy in modern medicine.
Bioinformatics