A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting

Cuihuan Zhao,Shuan Yan,Jiahang Li
DOI: https://doi.org/10.3390/ijms252211866
IF: 5.6
2024-11-06
International Journal of Molecular Sciences
Abstract:Thermophilic proteins maintain their stability and functionality under extreme high-temperature conditions, making them of significant importance in both fundamental biological research and biotechnological applications. In this study, we developed a machine learning-based thermophilic protein GradientBoosting prediction model, TPGPred, designed to predict thermophilic proteins by leveraging a large-scale dataset of both thermophilic and non-thermophilic protein sequences. By combining various machine learning algorithms with feature-engineering methods, we systematically evaluated the classification performance of the model, identifying the optimal feature combinations and classification models. Trained on a large public dataset of 5652 samples, TPGPred achieved an Accuracy score greater than 0.95 and an Area Under the Receiver Operating Characteristic Curve (AUROC) score greater than 0.98 on an independent test set of 627 samples. Our findings offer new insights into the identification and classification of thermophilic proteins and provide a solid foundation for their industrial application development.
biochemistry & molecular biology,chemistry, multidisciplinary
What problem does this paper attempt to address?