Detecting Proline and Non-Proline Cis-Isomers In Protein Structures from Sequences Using Deep Residual Ensemble Learning.

Jaswinder Singh,Jack Hanson,Rhys Heffernan,Kuldip Paliwal,Yuedong Yang,Yaoqi Zhou
DOI: https://doi.org/10.1021/acs.jcim.8b00442
IF: 6.162
2018-01-01
Journal of Chemical Information and Modeling
Abstract:It has been long established that cis conformations of amino acid residues play many biologically important roles despite their rare occurrence in protein structure. Because of this rarity, few methods have been developed for predicting cis isomers from protein sequences, most of which are based on outdated datasets and lack the means for independent testing. In this work, using a database of >10000 high-resolution protein structures, we update the statistics of cis isomers and develop a sequence-based prediction technique using an ensemble of residual convolutional and long short-term memory bidirectional recurrent neural networks that allow learning from the whole protein sequence. We show that ensembling eight neural network models yields maximum Matthews correlation coefficient values of approximately 0.35 for cis-Pro isomers and 0.1 for cis-nonPro residues. The method should be useful for prioritizing functionally important residues in cis isomers for experimental validations and improving the sampling of rare protein conformations for ab initio protein structure prediction.
What problem does this paper attempt to address?