Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins.

Rhys Heffernan,Abdollah Dehzangi,James G. Lyons,Kuldip K. Paliwal,Alok Sharma,Jihua Wang,Abdul Sattar,Yaoqi Zhou,Yuedong Yang
DOI: https://doi.org/10.1093/bioinformatics/btv665
IF: 5.8
2016-01-01
Bioinformatics
Abstract:Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the C alpha-C beta (HSE beta) vector or neighboring C alpha-C alpha vectors (HSE alpha). HSE alpha calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSE alpha and only one method to predict HSE beta. Results: This study developed a novel method for predicting both HSE alpha and HSE beta (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSE beta (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSE alpha has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSE beta (0.37) and ASA (0.43). The results, together with its easy C alpha-atom-based calculation, highlight the potential usefulness of predicted HSE alpha for protein structure prediction and refinement as well as function prediction.
What problem does this paper attempt to address?