Abstract:Protein stability is a critical aspect of molecular biology and biochemistry, hinges on an intricate balance of thermodynamic and structural factors. Determining protein stability is crucial for understanding and manipulating biological machineries, as it directly correlated with the protein function. Thus, this study delves into the intricacies of protein stability, highlighting its dependence on various factors, including thermodynamics, thermal conditions, and structural properties. Moreover, a notable focus is placed on the free energy change of unfolding (ΔG unfolding ), change in heat capacity (ΔCp) with protein structural transition, melting temperature (Tm) and number of disulfide bonds, which are critical parameters in understanding protein stability. In this study, a machine learning (ML) predictive model was developed to estimate these four parameters using the primary sequence of the protein. The shortfall of available tools for protein stability prediction based on multiple parameters propelled the completion of this study. Convolutional Neural Network (CNN) with multiple layers was adopted to develop a more reliable ML model. Individual predictive models were prepared for each property, and all the prepared models showed results with high accuracy. The R 2 (coefficient of determination) of these models were 0.79, 0.78, 0.92 and 0.92, respectively, for ΔG, ΔCp, Tm and disulfide bonds. A case study on stability analysis of two homologous proteins was presented to validate the results predicted through the developed model. The case study included in silico analysis of protein stability using molecular docking and molecular dynamic simulations. This validation study assured the accuracy of each model in predicting the stability associated properties. The alignment of physics-based principles with ML models has provided an opportunity to develop a fast machine learning solution to replace the computationally demanding physics-based calculations used to determine protein stability. Furthermore, this work provided valuable insights into the impact of mutation on protein stability, which has implications for the field of protein engineering. The source codes are available at https://github.com/Growdeatechnology .

Comparison and evaluation of data-driven protein stability prediction models

Differentiating stable and unstable protein using convolution neural network and molecular dynamics simulations

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Exploring evolution to uncover insights into protein mutational stability

Analysis and Prediction of Protein Stability Based on Interaction Network, Gene Ontology, and KEGG Pathway Enrichment Scores.

Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset

Protein stability models fail to capture epistatic interactions of double point mutations

Predicting absolute protein folding stability using generative models

Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset

Improving Inverse Folding models at Protein Stability Prediction without additional Training or Data

BayeStab: Predicting Effects of Mutations on Protein Stability with Uncertainty Quantification

AI challenges for predicting the impact of mutations on protein stability

Predicting a Protein's Stability under a Million Mutations

A Pipeline for Data-Driven Learning of Topological Features with Applications to Protein Stability Prediction

AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network

Review of predicting protein stability changes upon variations

A probabilistic view of protein stability, conformational specificity, and design

Three Simple Properties Explain Protein Stability Change upon Mutation

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset