SemiHS: an Iterative Semi-Supervised Approach for Predicting Protein-Protein Interaction Hot Spots.

Lei Deng,Ji-Hong Guan,Qi-Wen Dong,Shui-Geng Zhou
DOI: https://doi.org/10.2174/092986611796011419
2011-01-01
Protein and Peptide Letters
Abstract:Protein-protein interaction hot spots, as revealed by alanine scanning mutagenesis, make dominant contributions to the free energy of binding. Since mutagenesis experiments are expensive and time-consuming, the development of computational methods to identify hot spots is becoming increasingly important. In this study, by using a new combination of sequence, structure and energy features, we propose an iterative semi-supervised algorithm, SemiHS, to incorporate unlabeled data to improve the accuracy of hot spots prediction when sufficient training data is un-available and to overcome the imbalanced data problem. We evaluate the predictive power of SemiHS on a labeled set of 265 alanine-mutated interface residues in 17 complexes and a large unlabeled set of 2465 interface residues with 10-fold cross validation, and get an AUC score of 0.85, with a sensitivity of 0.70 and a specificity of 0.87, which are better than those of the existing methods. Moreover, we validate the proposed method by an independent test and obtain encouraging results.
What problem does this paper attempt to address?