Abstract:Antimicrobial peptides (AMPs) are vital components of innate immunotherapy. Existing approaches mainly rely on either deep learning for the automatic extraction of sequence features or traditional manual amino acid features combined with machine learning. The peptide sequence contains symmetrical sequence motifs or repetitive amino acid patterns, which may be related to the function and structure of the peptide. Recently, the advent of large language models has significantly boosted the representational power of sequence pattern features. In light of this, we present a novel AMP predictor called UniproLcad, which integrates three prominent protein language models—ESM-2, ProtBert, and UniRep—to obtain a more comprehensive representation of protein features. UniproLcad utilizes deep learning networks, encompassing the bidirectional long and short memory network (Bi-LSTM) and one-dimensional convolutional neural networks (1D-CNN), while also integrating an attention mechanism to enhance its capabilities. These deep learning frameworks, coupled with pre-trained language models, efficiently extract multi-view features from antimicrobial peptide sequences and assign attention weights to them. Through ten-fold cross-validation and independent testing, UniproLcad demonstrates competitive performance in the field of antimicrobial peptide identification. This integration of diverse language models and deep learning architectures enhances the accuracy and reliability of predicting antimicrobial peptides, contributing to the advancement of computational methods in this field.

What problem does this paper attempt to address?

The aim of this paper is to develop a more accurate and comprehensive method for predicting Antimicrobial Peptides (AMPs). Specifically, the researchers address the limitations of existing methods in handling AMP prediction, such as the inability to fully capture the diversity of data distribution and the incomplete representation of single protein language models, by proposing a new method called UniproLcad. The main contributions of UniproLcad include: 1. **Integration of multiple protein language models**: The researchers combined three mainstream protein language models—ESM-2, ProtBert, and UniRep—to obtain a more comprehensive representation of protein features. These models are based on different architectures (Transformer, BERT, and RNN), thus capturing information from peptide sequences from multiple perspectives. 2. **Utilization of deep neural network structures**: To further enhance model performance, UniproLcad employs Bidirectional Long Short-Term Memory networks (Bi-LSTM) and one-dimensional Convolutional Neural Networks (1D-CNN), and uses attention mechanisms to strengthen its ability to focus on important features. 3. **Addressing symmetry issues in peptide sequences**: Symmetry patterns that may exist in peptide sequences are crucial for the function and structure of peptides. By using Bi-LSTM and 1D-CNN, UniproLcad can effectively identify and extract these symmetrical features. 4. **Improving prediction accuracy and generalization ability**: Through performance evaluation on 10-fold cross-validation and independent test sets, UniproLcad demonstrated its competitiveness in the field of AMP prediction, showing higher accuracy and reliability compared to existing methods. In summary, UniproLcad is a novel AMP prediction tool that integrates multiple protein language models and deep learning techniques, aiming to overcome the shortcomings of existing methods and provide more accurate and reliable prediction results.

UniproLcad: Accurate Identification of Antimicrobial Peptide by Fusing Multiple Pre-Trained Protein Language Models

Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification

iAMP-Attenpred: a novel antimicrobial peptide predictor based on BERT feature extraction method and CNN-BiLSTM-Attention combination model

Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides

SAMP: Identifying Antimicrobial Peptides by an Ensemble Learning Model Based on Proportionalized Split Amino Acid Composition

DMAMP: A deep-learning model for detecting antimicrobial peptides and their multi-activities

deepAMPNet: a novel antimicrobial peptide predictor employing AlphaFold2 predicted structures and a bi-directional long short-term memory protein language model

HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides

EnAMP: A novel deep learning ensemble antibacterial peptide recognition algorithm based on multi-features

E-CLEAP: An ensemble learning model for efficient and accurate identification of antimicrobial peptides

dsAMP and dsAMPGAN: Deep Learning Networks for Antimicrobial Peptides Recognition and Generation

[An antibacterial peptides recognition method based on BERT and Text-CNN]

Identification of antimicrobial peptides from the human gut microbiome using deep learning

Deep learning improves antimicrobial peptide recognition

Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides

PGAT-ABPp: harnessing protein language models and graph attention networks for antibacterial peptide identification with remarkable accuracy

A novel antibacterial peptide recognition algorithm based on BERT

Integrating transformer and imbalanced multi-label learning to identify antimicrobial peptides and their functional activities

Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides

CalcAMP: A New Machine Learning Model for the Accurate Prediction of Antimicrobial Activity of Peptides

Accelerating Antimicrobial Peptide Discovery for WHO Priority Pathogens through Predictive and Interpretable Machine Learning Models