K-Mer Sparse Matrix Model for Genetic Sequence and Its Applications in Sequence Comparison.

Jia Wen,Yuyan Zhang,Stephen S. T. Yau
DOI: https://doi.org/10.1016/j.jtbi.2014.08.028
IF: 2.405
2014-01-01
Journal of Theoretical Biology
Abstract:Based on the k-mer model for genetic sequence, a k-mer sparse matrix representation is proposed to denote the types and sites of k-mers appearing in a genetic sequence, and there exists a one-to-one relationship between a genetic sequence and its associated k-mer sparse matrix. With the singular value decomposition of the k-mer sparse matrix, the k-mer singular value vector is constructed and utilized to numerically quantify the characteristics of a genetic sequence. We investigate and evaluate the optimum value k(⁎) chosen for our k-mer sparse matrix model for genetic sequence. To show the usefulness of our k-mer sparse matrix model method, it is applied to the comparison of genetic sequences, and the results obtained fully demonstrate that our proposed method is very powerful in analyzing and determining the relationships of genetic sequences.
What problem does this paper attempt to address?