Amino acid torsion angles enable prediction of protein fold classification

Kun Tian,Xin Zhao,Xiaogeng Wan,Stephen S.-T. Yau
DOI: https://doi.org/10.1038/s41598-020-78465-1
IF: 4.6
2020-12-01
Scientific Reports
Abstract:Abstract Protein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins are complex and require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction. Here we describe a new approach to predict of protein structures and structure classes from amino acid sequences. Our prediction model performs well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. The average accuracy is 92.5% for structure classification, which is higher than that of previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence. The results show that our method provides a new effective and reliable tool for protein structure prediction research.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict the three - dimensional structure of proteins from amino acid sequences. Specifically, the author proposes a new method to predict the folding classification and structure of proteins by analyzing the torsion angle information of amino acids. This method aims to overcome the problems of long time - consuming and low efficiency in predicting the structures of large - scale protein molecules in existing technologies, and can predict multiple possible protein structures from a single amino acid sequence. ### Main research questions: 1. **Improve the accuracy and speed of protein structure prediction**: Existing protein structure prediction methods often require long - time analysis of experimental results, especially for large - scale protein molecules, which makes these methods not reliable and efficient enough in practical applications. Therefore, it is very necessary to develop a faster and more accurate method to predict protein structures. 2. **Predict multiple protein structures from amino acid sequences**: Many existing methods can only predict one most likely structure, but cannot predict multiple different structures of the same sequence. The method proposed in this paper can predict multiple different protein structures from a single amino acid sequence, which is difficult to achieve by existing methods. ### Research background: - **Importance of protein structure**: The three - dimensional structure of proteins is closely related to their functions. Therefore, predicting protein structures is crucial for understanding their functions and interactions. - **Limitations of existing technologies**: Currently, commonly used protein structure determination techniques (such as X - ray crystallography, nuclear magnetic resonance spectroscopy, etc.) are complex and time - consuming, especially when dealing with large - scale protein molecules. - **Application of machine learning**: In recent years, with the development of machine learning technology, many new protein structure prediction methods have emerged, but these methods still have certain limitations. ### Method innovation points: - **Use amino acid torsion angle information**: The method proposed in this paper is based on the torsion angle information in amino acid sequences, and predicts protein structures by analyzing a large amount of torsion angle data. - **Multi - structure prediction**: This method can not only predict the most likely structure, but also predict multiple different structures of the same sequence, which is of great significance for understanding the dynamic changes of proteins. ### Experimental verification: - **CATH database test**: The author used two large - scale CATH protein structure classification data sets for testing, and the results showed that the average accuracy rate of this method in structure classification reached 92.5%, which was significantly higher than other methods. - **Single - sequence multi - structure prediction**: By analyzing a sequence of 148 amino acids, four known protein structures were successfully predicted and compared with existing methods (such as RaptorX and I - TASSER), which proved the superiority of this method. ### Conclusion: The method proposed in this paper performs excellently in protein structure prediction. It not only improves the prediction accuracy and speed, but also can predict multiple different protein structures from a single amino acid sequence. This achievement provides a new and effective tool for protein structure prediction research.