Decoding molecular mechanisms for loss of function variants in the human proteome

Matteo Cagiada,Nicolas Jonsson,Kresten Lindorff-Larsen
DOI: https://doi.org/10.1101/2024.05.21.595203
2024-05-22
Abstract:Proteins play a critical role in cellular function by interacting with other biomolecules; missense variants that cause loss of protein function can lead to a broad spectrum of genetic disorders. While much progress has been made on predicting which missense variants may cause disease, our ability to predict the underlying molecular mechanisms remain limited. One common mechanism is that missense variants cause protein destabilization resulting in lowered protein abundance and loss of function, while other variants directly disrupt key interactions with other molecules. We have here leveraged machine learning models for protein sequence and structure to disentangle effects on protein function and abundance, and applied our resulting model to all missense variants in the human proteome. We find that approximately half of all missense variants that lead to loss of function and disease do so because they disrupt protein stability. We have predicted functionally important positions in all human proteins, and find that they cluster on protein structures and are often found on the protein surface. Our work provides a resource for interpreting both predicted and experimental variant effects across the human proteome, and a mechanistic starting point for developing therapies towards genetic diseases.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict the specific molecular mechanisms of missense variants that lead to protein function loss. Specifically, researchers hope to develop a method that can distinguish whether these variants lead to function loss because they affect protein stability or directly interfere with protein function. This problem is of great significance for understanding the molecular basis of genetic diseases and developing corresponding treatment strategies. ### Main contributions of the paper 1. **Development of the FunC - ESMs model**: - **Predicting loss - of - function variants**: By combining two pre - trained machine - learning models, ESM - 1b and ESM - IF, FunC - ESMs can quickly and accurately predict which missense variants will lead to protein function loss. - **Distinguishing mechanisms**: Further distinguish whether these variants lead to function loss by reducing protein stability or directly affecting protein function. 2. **Large - scale application**: - **The human proteome**: This model has been applied to all missense variants in the human proteome, including more than 100,000 clinically annotated variants, involving more than 10,000 human proteins. - **Discovery of functionally important sites**: By analyzing the prediction results, researchers have discovered a large number of functionally important sites that are clustered in the protein structure and are often located on the protein surface. 3. **Verification and accuracy**: - **Experimental verification**: The accuracy of FunC - ESMs has been verified by multiple experimental data, including multi - modal variant effect analyses (MAVEs) and known clinically annotated variants. - **Performance evaluation**: The performance of the model in predicting clinical pathogenicity is close to that of the current state - of - the - art methods, especially excellent in predicting path generation. ### Main findings - **Variants in protein stability**: Approximately half of the missense variants that lead to function loss are due to the influence on protein stability. - **Functionally important sites**: Functionally important sites tend to be clustered in the protein structure and are usually located on the protein surface. - **Variants in different regions**: In the folded region, most pathogenic variants act by directly affecting protein function or reducing protein stability; while in the disordered region, most pathogenic variants act by directly affecting protein function. ### Conclusion This study provides a powerful tool for explaining the effects of missense variants in the human proteome and provides a new perspective for understanding the molecular mechanisms of genetic diseases. In addition, this tool can also be used as a starting point for developing therapies for genetic diseases.