Abstract:Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https://github.com/ckx259/E2EPep.git for academic use only.

Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction

Absence of strial melanin coincides with age-associated marginal cell loss and endocochlear potential decline

Fine-tuning protein language models boosts predictions across diverse tasks

Efficient Inference, Training, and Fine-tuning of Protein Language Models

Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models

Parameter-Efficient Fine-Tuning of State Space Models

Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

Parameter-Efficient Fine-Tuning Enhances Adaptation of Single Cell Large Language Model for Cell Type Identification

Does protein pretrained language model facilitate the prediction of protein–ligand interaction?

SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models

Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model

Interpretable improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein

Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

Large language models capsule: A research analysis of In-Context Learning (ICL) and Parameter-Efficient Fine-Tuning (PEFT) methods

Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning