A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches

Wenkang Wang,Yunyan Shuai,Qiurong Yang,Fuhao Zhang,Min Zeng,Min Li
DOI: https://doi.org/10.1093/bib/bbae050
IF: 9.5
2024-01-22
Briefings in Bioinformatics
Abstract:Abstract Proteins play an important role in life activities and are the basic units for performing functions. Accurately annotating functions to proteins is crucial for understanding the intricate mechanisms of life and developing effective treatments for complex diseases. Traditional biological experiments struggle to keep pace with the growing number of known proteins. With the development of high-throughput sequencing technology, a wide variety of biological data provides the possibility to accurately predict protein functions by computational methods. Consequently, many computational methods have been proposed. Due to the diversity of application scenarios, it is necessary to conduct a comprehensive evaluation of these computational methods to determine the suitability of each algorithm for specific cases. In this study, we present a comprehensive benchmark, BeProf, to process data and evaluate representative computational methods. We first collect the latest datasets and analyze the data characteristics. Then, we investigate and summarize 17 state-of-the-art computational methods. Finally, we propose a novel comprehensive evaluation metric, design eight application scenarios and evaluate the performance of existing methods on these scenarios. Based on the evaluation, we provide practical recommendations for different scenarios, enabling users to select the most suitable method for their specific needs. All of these servers can be obtained from https://csuligroup.com/BEPROF and https://github.com/CSUBioGroup/BEPROF.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve several key challenges in the field of Automatic Protein Function Prediction (AFP): 1. **Lack of comprehensive evaluation in existing methods**: - Although many computational methods have been proposed for automatic protein function prediction, the performance comparison among these methods is still insufficient, unable to provide clear selection criteria for researchers. - There is a lack of comprehensive evaluation of these methods in different application scenarios. 2. **Limitations of existing evaluation metrics**: - Current evaluation metrics fail to consider the structural relationships between protein functions, which results in their inability to accurately measure the overall performance of AFP methods. 3. **Insufficient utilization of data resources**: - With the development of high - throughput sequencing technology, a large amount of biological data resources have become available, but how to effectively utilize these data resources for protein function prediction remains a challenge. ### Solutions To address the above challenges, the author proposes a comprehensive computational benchmark platform - BeProf, with the following specific goals: 1. **Introduce new evaluation metrics**: - BeProf introduces a new comprehensive evaluation metric that considers both the depth of function and the Information Content (IC) to more accurately evaluate the performance of different methods. 2. **Cover the latest data resources**: - BeProf incorporates the latest databases, processes data for model input, and analyzes the distribution of proteins and functions. 3. **Evaluate multiple computational methods**: - BeProf covers 17 representative computational methods and designs 8 specific application scenarios to thoroughly test the performance of these methods. 4. **Provide practical suggestions**: - Based on the evaluation results, BeProf provides practical suggestions for different application scenarios, helping users choose the method that best suits their needs. ### Conclusion Through these measures, BeProf provides a valuable reference for research in the field of protein function prediction and helps promote the further development of this field.