Abstract:Motivation: Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions. Results: The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Naïve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631-0.993; MCC: 0.191-0.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions. Availability and implementation: The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https://github.com/SmirnygaTotoshka/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request.

MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

Predicting Pathology of Missense Mutations through Protein-Specific Evolutionary Pattern

MVP predicts the pathogenicity of missense variants by deep learning

Enhancing missense variant pathogenicity prediction with protein language models using VariPred

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

Mvppt: A Highly Efficient and Sensitive Pathogenicity Prediction Tool for Missense Variants

Accurate proteome-wide missense variant effect prediction with AlphaMissense

A New Era in Missense Variant Analysis: Statistical Insights and the Introduction of VAMPP-Score for Pathogenicity Assessment

PHACTboost: A Phylogeny-Aware Pathogenicity Predictor for Missense Mutations via Boosting

PHACTboost: A Phylogeny-aware Boosting Algorithm to Compute the Pathogenicity of Missense Mutations

Assessment of Computational Methods for Predicting the Effects of Missense Mutations in Human Cancers.

Leveraging cancer mutation data to predict the pathogenicity of germline missense variants

VPatho: a deep learning-based two-stage approach for accurate prediction of gain-of-function and loss-of-function variants

Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors

Pathogenicity classification of missense mutations based on deep generative model

Predicting non-neutral missense mutations and their biochemical consequences using genome-scale homology modeling of human protein complexes

Enhancing Missense Variant Pathogenicity Prediction with MissenseNet: Integrating Structural Insights and ShuffleNet-Based Deep Learning Techniques

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants

MetalPrognosis: A Biological Language Model-Based Approach for Disease-Associated Mutations in Metal-Binding Site Prediction

Critical assessment of missense variant effect predictors on disease-relevant variant data