Abstract:Motivation: Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions. Results: The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Naïve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631-0.993; MCC: 0.191-0.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions. Availability and implementation: The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https://github.com/SmirnygaTotoshka/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request.

Assessment of Computational Methods for Predicting the Effects of Missense Mutations in Human Cancers.

CanPredict: a Computational Tool for Predicting Cancer-Associated Missense Mutations

Computational Approaches for Predicting Causal Missense Mutations in Cancer Genome Projects

Leveraging cancer mutation data to predict the pathogenicity of germline missense variants

AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes

Predicting non-neutral missense mutations and their biochemical consequences using genome-scale homology modeling of human protein complexes

Critical assessment of missense variant effect predictors on disease-relevant variant data

A Computational Workflow for Analysis of Missense Mutations in Precision Oncology

Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2

Assessment of variant effect predictors unveils variants difficulty as a critical performance indicator

Evaluation of for variant classification in missense variants of solid cancer with actionable genetic targets

Progress on the development of prediction tools for detecting disease causing mutations in proteins

Are the Next-Generation Pathogenicity Predictors Applicable to Cancer?

Predicting loss-of-function impact of genetic mutations: a machine learning approach

MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

PRESCOTT: a population aware, epistatic and structural model accurately predicts missense effect

Representing mutations for predicting cancer drug response

Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors

Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment

Predicting Pathology of Missense Mutations through Protein-Specific Evolutionary Pattern