Abstract:The clinical interpretation of missense variants is critically important in diagnostics due to their potential to cause mild-to-severe effects on phenotype by altering protein structure. Evaluating these variants is essential because they can significantly impact disease outcomes and patient management. Many computational predictors, known as in silico pathogenicity predictors (ISPPs), have been developed to support the assessment of variant pathogenicity. Despite the abundance of these ISPPs, their predictions often lack accuracy and consistency, primarily due to limited data availability and the presence of erroneous data. This inconsistency can lead to false positive or negative results in pathogenicity evaluation, highlighting the need for standardization. The necessity for reliable evaluation methods has driven the development of numerous ISPPs, each attempting to address different aspects of variant interpretation. However, the sheer number of ISPPs and their varied performances make it challenging to achieve consensus in predictions. Therefore, a comprehensive statistical approach to evaluate and integrate these predictors is essential to improve accuracy. Here, we present a comprehensive statistical analysis comparing 52 available ISPPs, which aims to enhance the precision of variant classification. Our work introduces the Variant Analysis with Multiple Pathogenicity Predictors-score (VAMPP-score), a novel statistical framework designed for the assessment of missense variants. The VAMPP-score leverages the best gene-ISPP matches based on ISPP accuracies, providing a combinatorial weighted score that improves missense variant interpretation. We chose to develop a statistical framework rather than creating a new ISPP to capitalize on the strengths of existing predictors and to address their limitations through an integrative approach. This approach not only improves the evaluation of missense variants but also offers a flexible statistical framework designed to identify and utilize the best-performing ISPPs. By enhancing the accuracy of genetic diagnostics, particularly in the reanalysis of rare and undiagnosed cases, our framework aims to improve patient outcomes and advance the field of genetic research.

Enhancing missense variant pathogenicity prediction with protein language models using VariPred

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

ProPath: Disease-Specific Protein Language Model for Variant Pathogenicity

Genome-wide prediction of disease variant effects with a deep protein language model

Cross-protein transfer learning substantially improves disease variant prediction

REVEL: an Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants

Unsupervised language models for disease variant prediction

Accurate proteome-wide missense variant effect prediction with AlphaMissense

MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

MVP predicts the pathogenicity of missense variants by deep learning

Enhancing Missense Variant Pathogenicity Prediction with MissenseNet: Integrating Structural Insights and ShuffleNet-Based Deep Learning Techniques

Utilizing protein structure graph embeddings to predict the pathogenicity of missense variants

Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

Mvppt: A Highly Efficient and Sensitive Pathogenicity Prediction Tool for Missense Variants

Assessment of variant effect predictors unveils variants difficulty as a critical performance indicator

A New Era in Missense Variant Analysis: Statistical Insights and the Introduction of VAMPP-Score for Pathogenicity Assessment

TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion

Fine-tuning the ESM2 protein language model to understand the functional impact of missense variants

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Understanding structure-guided variant effect predictions using 3D convolutional neural networks

Protein Language Model Predicts Mutation Pathogenicity and Clinical Prognosis