Analysis of DNA Sequence Pattern Using Probabilistic Neural Network Model

Xiaoming Wu,F. X. Lu,Bo Wang,Jianhua Cheng
2005-01-01
Abstract:To discover frequently occurring DNA patterns related to inherent diseases or gene regulationassociated diseases, we must clarify which sequences interact with transcription factors ingenome. A probabilistic neural network model was introduced to represent variable length DNAsequence patterns. This model, combined with an EM algorithm, was used to discover conservedsequence patterns from some DNA sequences, and was successfully tested on two datasets, onecontaining simulated sequences and the other containing upstream sequences of genes in E.coli.Both fixed length and variable length patterns were discovered from the two datasets. Thesensitivity of this method was higher than two compared methods, and regulatory sequences ofgenes were discovered from real DNA sequences of gene clusters. This method could also be usedfor discovering patterns of protein sequences.ACM Classification: G.3 (Probability and Statistics-Probabilistic algorithms), I.5.1 (PatternRecognition – Models – Neural nets), J.3 (Life and Medical Sciences-Biology and genetics)
What problem does this paper attempt to address?