Similarity analysis of DNA sequences based on k-word

Yingxin Hu,Zhaohui Qi,Lijuan Zheng,Wenfeng Zhou
DOI: https://doi.org/10.1109/PIC.2014.6972409
2014-01-01
Abstract:Based on the position information and numbers of k-words, a method is proposed to compare genetic sequences and infer evolutionary relationship. In this study a characteristic vector whose elements are the average distances from the beginning of the k-word is introduced to represent DNA sequences. The approach has one to one correspondence between DNA sequences and vectors. In the end, we choose 48 HEV (Hepatitis E virus) and some mammalian species as test datasets to reconstruct the phylogenetic trees based on Euclidean distance measure. With comparison to other methods, the results show that this method is efficient and suitable for similarity analysis.
What problem does this paper attempt to address?