Abstract:Proteins play a pivotal role in many biological processes, and changes in their amino acid sequences can lead to dysfunction and disease. These changes can affect protein folding or interaction with other biomolecules, such as preventing antibodies from inhibiting a viral infection or causing proteins to misfold. The ability to predict the effects of mutations in proteins is crucial. Although experimental techniques can accurately quantify the effect of mutations on protein folding free energies and protein- protein binding free energies, they are often time-consuming and costly. By contrast, computational techniques offer fast and cost-effective alternatives for estimating free energies, but they typically suffer from lower accuracy. Enhancing the accuracy of computational predictions is therefore of high importance, with the potential to greatly impact fields ranging from drug design to understanding disease mechanisms. One such widely used computational method, FoldX, is capable of rapidly predicting the relative folding stability (ΔΔG ) for a protein as well as the relative binding affinity (ΔΔG ) between proteins using a single protein structure as input. However, it can suffer from low accuracy, especially for antibody-antigen systems. In this work, we trained a neural network on FoldX output to enhance its prediction accuracy. We first performed FoldX calculations on the largest datasets available for mutations that affect binding (SKEMPIv2) and folding (ProTherm4) with experimentally measured ΔΔG. Features were then extracted from the FoldX output files including its prediction for ΔΔG. We then developed and optimized a neural network framework to predict the difference between FoldX estimated ΔΔG and the experimental data, creating a model capable of producing a correction factor. Our approach showed significant improvements in Pearson correlation performance. For single mutations affecting folding, the correlation improved from a baseline of 0.3 to 0.66. In terms of binding, performance increased from 0.37 to 0.61 for single mutations and from 0.52 to 0.81 for double mutations. For epistasis, the correlation for binding affinity (both singles and doubles) improved from 0.19 to 0.59. Our results also indicated that models trained on double mutations enhanced accuracy when predicting higher-order mutations (such as triple or quadruple mutations), whereas models trained on singles did not. This suggests that interaction energy and epistasis effects present in the FoldX output are not fully utilized by FoldX itself. Once trained, these models add minimal computational time but provide a substantial increase in performance, especially for higher-order mutations and epistasis. This makes them a valuable addition to any free energy prediction pipeline using FoldX. Furthermore, we believe this technique can be further optimized and tested for predicting antibody escape, aiding in the efficient development of watch lists.

Improving Inverse Folding models at Protein Stability Prediction without additional Training or Data

Comparison and evaluation of data-driven protein stability prediction models

Transfer learning to leverage larger datasets for improved prediction of protein stability changes

Predicting protein stability changes under multiple amino acid substitutions using equivariant graph neural networks

DNDesign: Enhancing Physical Understanding of Protein Inverse Folding Model via Denoising

Exploring evolution to uncover insights into protein mutational stability

Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset

Predicting absolute protein folding stability using generative models

Inverse Protein Folding Using Deep Bayesian Optimization

Leveraging neural networks to correct FoldX free energy estimates

Improving Prediction of Secondary Structure, Local Backbone Angles and Solvent Accessible Surface Area of Proteins by Iterative Deep Learning

Learning inverse folding from millions of predicted structures

DDGemb: predicting protein stability change upon single- and multi-point variations with embeddings and deep learning

A Perspective on the Prospective Use of AI in Protein Structure Prediction

Differentiating stable and unstable protein using convolution neural network and molecular dynamics simulations

Unlocking the power of AI models: exploring protein folding prediction through comparative analysis

Improving the Prediction of Protein Stability Changes Upon Mutations by Geometric Learning and a Pre-Training Strategy

Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions

A probabilistic view of protein stability, conformational specificity, and design

Using AlphaFold to predict the impact of single mutations on protein stability and function

Predicting a Protein's Stability under a Million Mutations