A Novel Technique for Analyzing the Similarity and Dissimilarity of DNA Sequences
Y. W. Liu,Y. Peng
DOI: https://doi.org/10.4238/2014.january.28.2
2014-01-01
Genetics and Molecular Research
Abstract:l(i,j) denotes the distance between the point (xi, yi) and the point (xj, yi) in graphical representation. By classifying li,j, i, j = 1, 2,…, N according to the number of points between (xi, yi) and (xj, yi), N - 1 types are obtained. The average and variance of every type are assembled by the novel invariant v = (a1, d1, a2, d2,…, aN, d>N). Compared with the traditional invariants, the leading eigenvalue, the max-min (eigenvalue), the leading eigenvalue/N, the average matrix element, and the average row sum, this strategy complies with the rule of using the average, extracts more information about biological sequences, and reduces the amounts of computation. It is superior to the traditional invariants in predicting similarity and dissimilarity among different species.