An Efficient Mining Algorithm for Key Segment from DNA Sequences

Guojun Mao
DOI: https://doi.org/10.1109/ccece.2015.7129310
2015-01-01
Abstract:Unlike transaction sequences in business, DNA sequences typically have a small alphabet and a long length, and so mining DNA sequences faces different challenges from other applications. This paper deals with the problem of mining key segments from long DNA sequences. We design a compact data structure, called Association Matrix, to maintain in memory the statistical information from scanning DNA sequences. Based on the Association Matrix structure, we present an algorithm for mining key segments from a super long DNA sequence. We also evaluate the approach on synthetic and real life data sets, and its good performances in time and space are approved by the experiments.
What problem does this paper attempt to address?