Abstract:Abstract Background It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required. Results In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods. Conclusion We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html .

An ensemble approach to predict binding hotspots in protein–RNA interactions based on SMOTE data balancing and Random Grouping feature selection strategies

Boosting Prediction Performance of Protein-Protein Interaction Hot Spots by Using Structural Neighborhood Properties

An Improved Ensemble Learning Method with SMOTE for Protein Interaction Hot Spots Prediction.

Computationally Identifying Hot Spots in Protein-Dna Binding Interfaces Using an Ensemble Approach

Thorough Assessment of Machine Learning Techniques for Predicting Protein-Nucleic Acid Binding Hot Spots

APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility

Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features

SemiHS: an Iterative Semi-Supervised Approach for Predicting Protein-Protein Interaction Hot Spots.

A two-step ensemble learning for predicting protein hot spot residues from whole protein sequence

aPRBind: protein–RNA interface prediction by combining sequence and I-TASSER model-based structural features learned with convolutional neural networks

Prediction of Protein‒DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning

Predicting Hot Spots Using a Deep Neural Network Approach

Protein-DNA interface hotspots prediction based on fusion features of embeddings of protein language model and handcrafted features

A Sequence-segment Neighbor Encoding Schema for Protein Hotspot Residue Prediction

A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces

Effective Identification Of Hot Spots In Ppis Based On Ensemble Learning

An updated dataset and a structure‐based prediction model for protein–RNA binding affinity

An improved DNA-binding hot spot residues prediction method by exploring interfacial neighbor properties

Prediction of Protein-Protein Interaction Sites Using an Ensemble Method

Predicting Protein-Rna Interaction Amino Acids Using Random Forest Based on Submodularity Subset Selection

Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning