Abstract:Protein stability is a critical aspect of molecular biology and biochemistry, hinges on an intricate balance of thermodynamic and structural factors. Determining protein stability is crucial for understanding and manipulating biological machineries, as it directly correlated with the protein function. Thus, this study delves into the intricacies of protein stability, highlighting its dependence on various factors, including thermodynamics, thermal conditions, and structural properties. Moreover, a notable focus is placed on the free energy change of unfolding (ΔG unfolding ), change in heat capacity (ΔCp) with protein structural transition, melting temperature (Tm) and number of disulfide bonds, which are critical parameters in understanding protein stability. In this study, a machine learning (ML) predictive model was developed to estimate these four parameters using the primary sequence of the protein. The shortfall of available tools for protein stability prediction based on multiple parameters propelled the completion of this study. Convolutional Neural Network (CNN) with multiple layers was adopted to develop a more reliable ML model. Individual predictive models were prepared for each property, and all the prepared models showed results with high accuracy. The R 2 (coefficient of determination) of these models were 0.79, 0.78, 0.92 and 0.92, respectively, for ΔG, ΔCp, Tm and disulfide bonds. A case study on stability analysis of two homologous proteins was presented to validate the results predicted through the developed model. The case study included in silico analysis of protein stability using molecular docking and molecular dynamic simulations. This validation study assured the accuracy of each model in predicting the stability associated properties. The alignment of physics-based principles with ML models has provided an opportunity to develop a fast machine learning solution to replace the computationally demanding physics-based calculations used to determine protein stability. Furthermore, this work provided valuable insights into the impact of mutation on protein stability, which has implications for the field of protein engineering. The source codes are available at https://github.com/Growdeatechnology .

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model

Protein Language Model Fitness Is a Matter of Preference

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Superior protein thermophilicity prediction with protein language model embeddings

Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset

Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design

TemStaPro: protein thermostability prediction using sequence representations from protein language models

Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models

TemPL: A Novel Deep Learning Model for Zero-Shot Prediction of Protein Stability and Activity Based on Temperature-Guided Language Modeling

Differentiating stable and unstable protein using convolution neural network and molecular dynamics simulations

Protein Fitness Prediction Is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods

Rapid protein stability prediction using deep learning representations

HyperMPNN ‒ A general strategy to design thermostable proteins learned from hyperthermophiles

Predicting a Protein's Stability under a Million Mutations

Improving Inverse Folding models at Protein Stability Prediction without additional Training or Data

DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences

Enhancing predictions of protein stability changes induced by single mutations using MSA-based language models

PON-Tm: A Sequence-Based Method for Prediction of Missense Mutation Effects on Protein Thermal Stability Changes

Designing of thermostable proteins with a desired melting temperature

Exploring evolution to uncover insights into protein mutational stability