PGAT-ABPp: harnessing protein language models and graph attention networks for antibacterial peptide identification with remarkable accuracy

Yuelei Hao,Xuyang Liu,Haohao Fu,Xueguang Shao,Wensheng Cai
DOI: https://doi.org/10.1093/bioinformatics/btae497
IF: 5.8
2024-08-02
Bioinformatics
Abstract:Motivation: The emergence of drug-resistant pathogens represents a formidable challenge to global health. Using computational methods to identify the antibacterial peptides (ABPs), an alternative antimicrobial agent, has demonstrated advantages in further drug design studies. Most of the current approaches, however, rely on handcrafted features and underutilize structural information, which may affect prediction performance. Results: To present an ultra-accurate model for ABP identification, we propose a novel deep learning approach, PGAT-ABPp. PGAT-ABPp leverages structures predicted by AlphaFold2 and a pretrained protein language model, ProtT5-XL-U50 (ProtT5), to construct graphs. Then the graph attention network (GAT) is adopted to learn global discriminative features from the graphs. PGAT-ABPp outperforms the other fourteen state-of-the-art models in terms of accuracy, F1-score and Matthews Correlation Coefficient on the independent test dataset. The results show that ProtT5 has significant advantages in the identification of ABPs and the introduction of spatial information further improves the prediction performance of the model. The interpretability analysis of key residues in known active ABPs further underscores the superiority of PGAT-ABPp. Availability and implementation: The datasets and source codes for the PGAT-ABPp model are available at https://github.com/moonseter/PGAT-ABPp/.
What problem does this paper attempt to address?