Estimate the Occurrence Rate of the DNA Palindromes

I-Ping Tu,Yuan-Fu Huang,Shao-Hsuan Wang
DOI: https://doi.org/10.48550/arXiv.1104.5064
2011-04-27
Applications
Abstract:A DNA palindrome is a segment of double-stranded DNA sequence with inver- sion symmetry which may form secondary structures conferring significant biolog- ical functions ranging from RNA transcription to DNA replication. To test if the clusters of DNA palindromes distribute randomly is an interesting bioinformatic problem, where the occurrence rate of the DNA palindromes is a key estimator for setting up a test. The most commonly used statistics for estimating the occur- rence rate for scan statistics is the average rate. However, in our simulation, the average rate may double the null occurrence rate of DNA palindromes due to hot spot regions of 3000 bp's in a herpes virus genome. Here, we propose a formula to estimate the occurrence rate through an analytic derivation under a Markov assumption on DNA sequence. Our simulation study shows that the performance of this method has improved the accuracy and robustness against hot spots, as compared to the commonly used average rate. In addition, we derived analytical formula for the moment-generating functions of various statistics under a Markov model, enabling further calculations of p-values.
What problem does this paper attempt to address?