Abstract:BACKGROUND:Ligand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods. All these methods are based on traditional machine learning. In a series of binding residue prediction tasks, 3D-structure-based methods are widely superior to sequence-based methods. However, due to the great number of proteins with known amino acid sequences, sequence-based methods have considerable room for improvement with the development of deep learning. Therefore, prediction of protein-ligand binding residues with deep learning requires study.RESULTS:In this study, we propose a new sequence-based approach called DeepCSeqSite for ab initio protein-ligand binding residue prediction. DeepCSeqSite includes a standard edition and an enhanced edition. The classifier of DeepCSeqSite is based on a deep convolutional neural network. Several convolutional layers are stacked on top of each other to extract hierarchical features. The size of the effective context scope is expanded as the number of convolutional layers increases. The long-distance dependencies between residues can be captured by the large effective context scope, and stacking several layers enables the maximum length of dependencies to be precisely controlled. The extracted features are ultimately combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues. The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines. The methods are tested on a set of 151 nonredundant proteins and three extended test sets. Experiments show that the improvement of the Matthews correlation coefficient (MCC) is no less than 0.05. In addition, a training data augmentation method that slightly improves the performance is discussed in this study.CONCLUSIONS:Without using any templates that include 3D-structure data, DeepCSeqSite significantlyoutperforms existing sequence-based and 3D-structure-based methods, including COACH. Augmentation of the training sets slightly improves the performance. The model, code and datasets are available at https://github.com/yfCuiFaith/DeepCSeqSite .

Hybrid protein-ligand binding residue prediction with protein language models: Does the structure matter?

Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction

PLM-interact: extending protein language models to predict protein-protein interactions

Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures

A Spatial-Temporal Graph Attention Network for Protein-Ligand Binding Affinity Prediction Based on Molecular Geometry

Structure-Informed Protein Language Model

State-specific protein-ligand complex structure prediction with a multi-scale deep generative model

Predicting Protein-Ligand Binding Residues with Deep Convolutional Neural Networks

Protein-Protein Interaction Prediction is Achievable with Large Language Models

G- : Knowledge graph neural network for structure-free protein-ligand bioactivity prediction

Predicting Protein-Ligand Binding Affinity via Joint Global-Local Interaction Modeling

G- PLIP: Knowledge graph neural network for structure-free protein-ligand bioactivity prediction

Encoding Protein-Ligand Interactions: Binding Affinity Prediction with Multigraph-based Modeling and Graph Convolutional Network

Decoding the protein-ligand interactions using parallel graph neural networks

A hybrid quantum-classical fusion neural network to improve protein-ligand binding affinity predictions for drug discovery

Protein Language Models and Structure Prediction: Connection and Progression

State-specific protein–ligand complex structure prediction with a multiscale deep generative model

Exploring Protein-DNA Binding Residue Prediction and Consistent Interpretability Analysis Using Deep Learning

Binding Affinity Prediction with 3D Machine Learning: Training Data and Challenging External Testing

Protein Structure Prediction in the 3D HP Model Using Deep Reinforcement Learning