Abstract:Motivation: Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions. Results: The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Naïve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631-0.993; MCC: 0.191-0.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions. Availability and implementation: The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https://github.com/SmirnygaTotoshka/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request.

TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion

Enhancing missense variant pathogenicity prediction with protein language models using VariPred

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

TEMPO: A Transformer-Based Mutation Prediction Framework for SARS-CoV-2 Evolution

FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model

DTVF: A User-Friendly Tool for Virulence Factor Prediction Based on ProtT5 and Deep Transfer Learning Models

Deep Learning Prediction of Ribosome Profiling with Translatomer Reveals Translational Regulation and Interprets Disease Variants

Synthesis of inorganic polymers as glass precursors and for other uses: Pre‐ceramic block or graft copolymers as potential precursors to composite materials

SVPath: an accurate pipeline for predicting the pathogenicity of human exon structural variants

Viral transmission: Deadly contact

Computational identification of deleterious synonymous variants in human genomes using a feature-based approach

What is the diagnostic yield of colonoscopy in patients with a referral diagnosis of constipation in South Africa?

PhyloTransformer: A Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism

PhyloTransformer: A Self-supervised Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism

Accurate prediction of functional effect of single amino acid variants with deep learning

MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors

Deciphering the Language of Nature: A transformer-based language model for deleterious mutations in proteins

Mvppt: A Highly Efficient and Sensitive Pathogenicity Prediction Tool for Missense Variants

MVP predicts the pathogenicity of missense variants by deep learning

Efficiently Predicting Mutational Effect on Homologous Proteins by Evolution Encoding