Similarity analysis of DNA sequences based on the relative entropy

Wenlu Yang,Xiongjun Pi,Liqing Zhang
DOI: https://doi.org/10.1007/11539087_137
2005-01-01
Abstract:This paper investigates the similarity of two sequences, one of the main issues for fragments clustering and classification when sequencing the genomes of microbial communities directly sampled from natural environment. In this paper, we use the relative entropy as a criterion of similarity of two sequences and discuss its characteristics in DNA sequences. A method for evaluating the relative entropy is presented and applied to the comparison between two sequences. With combination of the relative entropy and the length of variables defined in this paper, the similarity of sequences is easily obtained. The SOM and PCA are applied to cluster subsequences from different genomes. Computer simulations verify that the method works well.
What problem does this paper attempt to address?