Abstract:Abstract The expansion of high-quality, low-cost sequencing has created an enormous opportunity to understand how genetic variants alter cellular behaviour in disease. The high diversity of mutations observed has however drawn a spotlight onto the need for predictive modelling of mutational effects on phenotype from variants of uncertain significance. This is particularly important in the clinic due to the potential value in guiding clinical diagnosis and patient treatment. Recent computational modelling has highlighted the importance of mutation induced protein misfolding as a common mechanism for loss of protein or domain function, aided by developments in methods that make large computational screens tractable. Here we review recent applications of this approach to different genes, and how they have enabled and supported subsequent studies. We further discuss developments in the approach and the role for the approach in light of increasingly high throughput experimental approaches.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the impact of gene mutations on protein folding and function, especially when interpreting variants of uncertain significance (VUS). Specifically, the paper focuses on the following points: 1. **Interpretation problems of gene mutations**: With the development of high - throughput sequencing technology, a large amount of gene mutation data has been obtained. However, how to accurately interpret the functional impacts of these mutations is a challenge. Especially in the clinical setting, many mutations are variants of uncertain significance (VUS), which complicates clinical diagnosis and the selection of treatment options. 2. **Impact of protein folding changes**: The paper emphasizes that protein misfolding caused by mutations is one of the important mechanisms in many diseases. By calculating the change in protein folding energy (\(\Delta\Delta G\)), the impact of mutations on protein structure and function can be predicted, thus providing a basis for clinical diagnosis. 3. **Application and development of computational models**: To address the above challenges, researchers have developed a variety of computational models and tools, such as FoldX, Rosetta, etc., to evaluate the impact of mutations on protein stability. These tools not only improve the understanding of individual mutations but also can screen the mutation effects of multiple genes on a large scale, helping to identify disease - related mutation patterns. 4. **Significance of clinical applications**: By combining experimental verification and computational simulation, the research results are helpful for improving clinical guidelines and supporting the classification and evaluation of gene mutations. For example, in the study of CDH1 gene mutations, the results of computational models were used to modify the clinical classification criteria, thereby improving the ability to interpret VUS. 5. **Future development directions**: The paper also discusses the possible future development directions in this field, including improving the accuracy of computational tools, expanding to more types of genes and mutations, and integrating these methods into existing machine - learning frameworks to better predict and interpret the impacts of gene mutations. In summary, this paper aims to solve the key problems in gene mutation interpretation through the means of protein folding calculation and provide a scientific basis for clinical diagnosis and treatment.

Understanding large scale sequencing datasets through changes to protein folding

Predicted mechanistic impacts of human protein missense variants

Integrating Large-Scale Protein Structure Prediction into Human Genetics Research

Using machine learning to predict the effects and consequences of mutations in proteins

Machine Learning of Three-Dimensional Protein Structures to Predict the Functional Impacts of Genome Variation

Quantification of the effect of mutations using a global probability model of natural sequence variation

SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models

wwLearning the language of proteins and predicting the impact of mutations

Proteomic Analysis and Prediction of Amino Acid Variations That Influence Protein Posttranslational Modifications

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Site saturation mutagenesis of 500 human protein domains reveals the contribution of protein destabilization to genetic disease

Ensemble Learning with Supervised Methods Based on Large-Scale Protein Language Models for Protein Mutation Effects Prediction

Studying protein folding in health and disease using biophysical approaches

Predicting non-neutral missense mutations and their biochemical consequences using genome-scale homology modeling of human protein complexes

Protein structural context of cancer mutations reveals molecular mechanisms and identifies novel candidate driver genes

Understanding genetic variants in context

Multiplexed assays of variant effects contribute to a growing genotype–phenotype atlas

Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges

Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

Heterogeneous folding landscapes and predetermined breaking points within a protein family

Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery