Abstract:The application of genomic signal processing methods to the problem of modeling and analysis of nanoporous DNA sequencing signals is considered in the paper. Based on the nucleotide sequences in the norm and in the case of mutations, 1200 signals are simulated, which represent 4 classes: norm, missense mutation, insertion mutation and deletion mutation. Correlation analysis was used to determine the similarity of nanoporous DNA sequencing signals using a cross-correlation function between two current signals in the protein nanopore, specifically signal in norm and in the presence of mutation. The location of the correlation peak determines the type of mutation (insertion or deletion), as well as the alignment of the same nucleotide sequences using a defined signal shift. The results of applying machine learning methods to the problem of classification of nanoporous DNA sequencing signals significantly depend on the noise level of the registered current signals through the protein nanopore and the type of mutation. Given a relatively low noise level, when the values of the ion current through a protein nanopore for different nucleotides do not intersect, the classification accuracy reaches 100%. In the case of increasing the standard deviation of the law of distribution of noise components, there is an overlap of the levels of current values in the nanopore in the case of its blocking by nucleotides of the close size. As a result, errors in the definition of normal and single nucleotide mutations (missense or nonsense) often occur, especially if the levels of current steps in the nanopore for two nucleotides are similar (for example, guanine and thymine, thymine and adenine, adenine and cytosine) and noise masks their contribution to reduction current in the nanopore. Mutations of insertion and deletion of a certain nucleotide sequence are often classified without errors, because these mutations are characterized by a shift of several nucleotides between normal signals and pathology, which increases the distance between these signals. Among the machine learning methods that have demonstrated the high accuracy of classification of the signals of nanopore-based DNA sequencing, the methods of linear discriminant, k-nearest neighbors classifier (with Euclidean distance and the sufficient number of nearest neighbors), as well as the method of reference vectors should be mentioned. The best results were obtained for the classification method of support vector machines. The use of linear, quadratic and cubic kernel functions shows the high accuracy of correctly classified signals - from 93 to 100%.

A Non-Linear Analogy Procedure for Gene Repair

A RNA Genetic Algorithm with Entropy Based Dynamic Mutation Probability

The Involvement of Replication in Single Stranded Oligonucleotide-Mediated Gene Repair

Computational Genes: a Tool for Molecular Diagnosis and Therapy of Aberrant Mutational Phenotype

The Why And How Of Dna Unlinking

A Quantitative Modelling Approach for DNA Repair on a Population Scale

Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences

A model for the emergence of the genetic code as a transition in a noisy information channel

To Understand Nature - Computer Modelling between Genetics and Evolution

A Natural Communication System on Genome Evolution

A probabilistic model to describe the dual phenomena of biochemical pathway damage and biochemical pathway repair

A Toy Model for Cooperative Phenomena in Molecular Biology and the Utilization of Biochemical Applications of PNS in Genetic Applications

Action-At-A-Distance in DNA Mismatch Repair: Mechanistic Insights and Models for How DNA and Repair Proteins Facilitate Long-Range Communication

Gene algebra from a genetic code algebraic structure

Continuous evolution of user-defined genes at 1 million times the genomic mutation rate

Emergent Network Structure, evolvable Robustness and non-linear Effects of Point Mutations in an Artificial Genome Model

Simulation and Analysis of Bionanopore Dna Sequencing Signals for Genetic Mutations Detection

Global genetic rewiring during compensatory evolution in the yeast polarity network

Continuous Optimization Algorithm Based on DNA

Tackling redundancy: genetic mechanisms underlying paralog compensation in plants.

Role of Duplicate Genes in Genetic Robustness Against Null Mutations.