DeepSS2GO: protein function prediction from secondary structure

Fu V Song,Jiaqi Su,Sixing Huang,Neng Zhang,Kaiyue Li,Ming Ni,Maofu Liao
DOI: https://doi.org/10.1093/bib/bbae196
IF: 9.5
2024-05-05
Briefings in Bioinformatics
Abstract:Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently and accurately utilize the secondary structure information of proteins in protein function prediction. Specifically, the paper proposes a new method named DeepSS2GO, aiming to improve the accuracy and speed of protein function prediction by combining the secondary structure features, primary sequence information and homology alignment information of proteins. This method is especially suitable for processing a large amount of protein sequence data generated by high - throughput sequencing technologies, and can effectively predict the functions of proteins without relying on time - consuming three - dimensional structure analysis. ### Main problems 1. **Challenges in protein function prediction**: - High - throughput sequencing technologies have generated a large amount of protein sequence data, but experimental methods cannot meet the needs for rapid functional annotation of these data. - Traditional sequence - based methods are fast but have low accuracy; while three - dimensional structure - based methods are accurate but computationally costly and time - consuming. 2. **Deficiencies of existing methods**: - Sequence - based methods are fast when dealing with large - scale data, but their accuracy is limited. - Three - dimensional structure - based methods are accurate but have high computational complexity and are difficult to be applied to large - scale data. - Existing methods have limited generalization ability in cross - species prediction. ### Solutions - **DeepSS2GO**: This method realizes the improvement of prediction speed while maintaining high accuracy by introducing the secondary structure information of proteins and combining the primary sequence and homology alignment information. - **Secondary structure information**: By predicting the secondary structure of proteins and extracting structural features, these features are highly conserved among different species, which helps to improve the accuracy of cross - species prediction. - **Primary sequence information**: Utilize the primary sequence information of proteins to supplement the secondary structure features and improve the comprehensiveness of prediction. - **Homology alignment information**: Conduct homology alignment through the Diamond algorithm to obtain the function information of homologous proteins and further enhance the accuracy of prediction. ### Main contributions 1. **Performance improvement**: - The prediction performance of DeepSS2GO on the Molecular Function Ontology (MFO), Cellular Component Ontology (CCO) and Biological Process Ontology (BPO) exceeds that of the existing advanced algorithms. - Especially on the minimum sensitivity index (Smin) evaluation index, DeepSS2GO has achieved the highest ranking in all three sub - ontologies. 2. **Computational efficiency**: - The prediction speed of DeepSS2GO is five times faster than that of the existing advanced algorithms, which is suitable for the rapid processing of large - scale data. 3. **Cross - species generalization ability**: - The generalization ability of DeepSS2GO among different species has been verified through cross - species tests, and it performs particularly well in predicting the key functions of non - homologous proteins. ### Conclusion DeepSS2GO provides an efficient and accurate method for protein function prediction by combining secondary structure, primary sequence and homology alignment information, which is especially suitable for processing a large amount of protein sequence data generated by high - throughput sequencing technologies. This method not only improves the accuracy of prediction, but also significantly improves the computational efficiency, providing a new tool for protein function research.