Specific and intrinsic sequence patterns extracted by deep learning from intra-protein binding and non-binding peptide fragments

Yuhong Wang,Junzhou Huang,Wei Li,Sheng Wang,Chuanfan Ding
DOI: https://doi.org/10.1038/s41598-017-14877-w
IF: 4.6
2017-01-01
Scientific Reports
Abstract:The key finding in the DNA double helix model is the specific pairing or binding between nucleotides A-T and C-G, and the pairing rules are the molecule basis of genetic code. Unfortunately, no such rules have been discovered for proteins. Here we show that intrinsic sequence patterns between intra-protein binding peptide fragments exist, they can be extracted using a deep learning algorithm, and they bear an interesting semblance to the DNA double helix model. The intra-protein binding peptide fragments have specific and intrinsic sequence patterns, distinct from non-binding peptide fragments, and multi-millions of binding and non-binding peptide fragments from currently available protein X-ray structures are classified with an accuracy of up to 93%. The specific binding between short peptide fragments may provide an important driving force for protein folding and protein-protein interaction, two open and fundamental problems in molecular biology, and it may have significant potential in design, discovery, and development of peptide, protein, and antibody drugs.
What problem does this paper attempt to address?