Abstract:Abstract Background It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required. Results In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods. Conclusion We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html .

Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins.

Predicting protein structure from long-range contacts.

Sequence-Based Prediction of Protein-Protein Binding Residues in Alpha-Helical Membrane Proteins.

Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks

Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors.

APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility

Hmm-Based Prediction for Protein Structural Motifs' Two Local Properties: Solvent Accessibility and Backbone Torsion Angles

Improving Prediction of Residue Solvent Accessibility with SVR and Multiple Sequence Alignment Profile

SPOT-1D-Single: improving the single-sequence-based prediction of protein secondary structure, backbone angles, solvent accessibility and half-sphere exposures using a large training set and ensembled deep learning

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

SemiHS: an Iterative Semi-Supervised Approach for Predicting Protein-Protein Interaction Hot Spots.

Prediction of Heme Binding Residues from Protein Sequences with Integrative Sequence Profiles

Prediction of Heme Binding Sites in Heme Proteins Using an Integrative Sequence Profile Coupling Evolutionary Information with Physicochemical Properties.

PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations

Predicting protein secondary structure and solvent accessibility with an improved multiple linear regression method.

CSSP-2.0: A refined consensus method for accurate protein secondary structure prediction

Solvent Accessibility Promotes Rotamer Errors during Protein Modeling with Major Side-Chain Prediction Programs

Prediction Of Hot Spots Based On Physicochemical Features And Relative Accessible Surface Area Of Amino Acid Sequence

E-pRSA: Embeddings Improve the Prediction of Residue Relative Solvent Accessibility in Protein Sequence

Assessment of hydrophobicity scales for protein stability and folding using energy and RMSD criteria

Prediction of Relative Solvent Accessibility Using Support Vector Regression