Analysis of similarities/dissimilarities of DNA sequences Based on a novel graphical representation

Jiafeng Yu,Jihua Wang,Xiao Sun
2010-01-01
Abstract:According to the physiochemical property of the base at the first site, the 16 kinds of dinucleotides are classified into four groups. Based on such classification, we propose a novel graphical representation of DNA sequence without loss of information due to overlapping and crossing of the curve with itself. This representation allows direct inspection of compositions and distributions of dinucleotides and visual recognition of similarities/dissimilarities among different sequences. A 6D vector is exploited as quantitative descriptor from this representation, which can display both the global and local features of DNA sequences in a 6D phase space. The applications in similarities/dissimilarities analysis of the complete coding sequences of beta globin genes of eleven species illustrate their utilities.
What problem does this paper attempt to address?