Abstract:Protein stability is a critical aspect of molecular biology and biochemistry, hinges on an intricate balance of thermodynamic and structural factors. Determining protein stability is crucial for understanding and manipulating biological machineries, as it directly correlated with the protein function. Thus, this study delves into the intricacies of protein stability, highlighting its dependence on various factors, including thermodynamics, thermal conditions, and structural properties. Moreover, a notable focus is placed on the free energy change of unfolding (ΔG unfolding ), change in heat capacity (ΔCp) with protein structural transition, melting temperature (Tm) and number of disulfide bonds, which are critical parameters in understanding protein stability. In this study, a machine learning (ML) predictive model was developed to estimate these four parameters using the primary sequence of the protein. The shortfall of available tools for protein stability prediction based on multiple parameters propelled the completion of this study. Convolutional Neural Network (CNN) with multiple layers was adopted to develop a more reliable ML model. Individual predictive models were prepared for each property, and all the prepared models showed results with high accuracy. The R 2 (coefficient of determination) of these models were 0.79, 0.78, 0.92 and 0.92, respectively, for ΔG, ΔCp, Tm and disulfide bonds. A case study on stability analysis of two homologous proteins was presented to validate the results predicted through the developed model. The case study included in silico analysis of protein stability using molecular docking and molecular dynamic simulations. This validation study assured the accuracy of each model in predicting the stability associated properties. The alignment of physics-based principles with ML models has provided an opportunity to develop a fast machine learning solution to replace the computationally demanding physics-based calculations used to determine protein stability. Furthermore, this work provided valuable insights into the impact of mutation on protein stability, which has implications for the field of protein engineering. The source codes are available at https://github.com/Growdeatechnology .

A Pipeline for Data-Driven Learning of Topological Features with Applications to Protein Stability Prediction

Comparison and evaluation of data-driven protein stability prediction models

Combining Network Topological Characteristics With Sequence And Structure Based Features For Predicting Protein Stability Changes Upon Single Amino Acid Mutation

A DNN Biophysics Model with Topological and Electrostatic Features

TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

A topological approach for protein classification

Multiscale topology-enabled structure-to-sequence transformer for protein–ligand interaction predictions

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction

A Topological Data Analysis of the Protein Structure

Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset

Differentiating stable and unstable protein using convolution neural network and molecular dynamics simulations

Persistent homology analysis of protein structure, flexibility and folding

Persistent spectral theory-guided protein engineering

Large scale analysis of predicted protein structures links model features to behaviour

Unbiased Curriculum Learning Enhanced Global-Local Graph Neural Network for Protein Thermodynamic Stability Prediction

Exploring evolution to uncover insights into protein mutational stability

Analysis and Prediction of Protein Stability Based on Interaction Network, Gene Ontology, and KEGG Pathway Enrichment Scores.

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Novel Feature for Catalytic Protein Residues Reflecting Interactions with Other Residues