Identification of repeats in DNA sequences using nucleotide distribution uniformity

Changchuan Yin
DOI: https://doi.org/10.48550/arXiv.1608.00567
2016-08-01
Abstract:Repetitive elements are important in genomic structures, functions and regulations, yet effective methods in precisely identifying repetitive elements in DNA sequences are not fully accessible, and the relationship between repetitive elements and periodicities of genomes is not clearly understood. We present an $\textit{ab initio}$ method to quantitatively detect repetitive elements and infer the consensus repeat pattern in repetitive elements. The method uses the measure of the distribution uniformity of nucleotides at periodic positions in DNA sequences or genomes. It can identify periodicities, consensus repeat patterns, copy numbers and perfect levels of repetitive elements. The results of using the method on different DNA sequences and genomes demonstrate efficacy and accuracy in identifying repeat patterns and periodicities. The complexity of the method is linear with respect to the lengths of the analyzed sequences.
Genomics,Computational Engineering, Finance, and Science,Computer Vision and Pattern Recognition,Data Structures and Algorithms
What problem does this paper attempt to address?