An Efficient Algorithm for Mining DNA Sequences Based on the Association Matrix

国君 毛
DOI: https://doi.org/10.12677/csa.2015.58035
2015-01-01
Computer Science and Application
Abstract:The DNA analysis is the core of bioinformatics research, and as an important technology to support bioinformatics, the data mining has been widely applied to the analysis of DNA sequences. Compared to the transaction sequences in traditional business areas, DNA sequences have the characteristics that are item-less but length-longer, so the classic sequence mining algorithms are not perfectly suitable for the DNA sequence pattern mining. Based on the analysis of DNA sequence mining demands, we propose an efficient data structure, called Association Matrix. Such a structure can compress a long DNA sequence into a matrix form which can be effectively analyzed. Therefore, by making use of the space compactness of this structure, we can deal with DNA sequences with a super-long length in a limited memory. Based on the Association Matrix, we design an efficient mining algorithm to find the key segments from DNA Sequence. Experiments show that the proposed algorithm performs well in DNA sequence mining.
What problem does this paper attempt to address?