Research on Counting Algorithm of K-Mer Occurrence in DNA Sequence

WANG Shulin,WANG Ji,CHEN Huowang,ZHANG Dingxing
DOI: https://doi.org/10.3969/j.issn.1000-3428.2007.09.014
2007-01-01
Abstract:【Abstract】There is a close relationship between the structures of whole genome and its functions which are expressed by its subsequences. Researching the structure of DNA sequence has a profound meaning to bioinformatics. The problem that all k-mers in whole genome are counted is researched. The internal and external algorithm which counts all k-mers occurrence in DNA sequences is designed and implemented. This algorithm translates the problem of counting all k-mers into the problem of counting integer keys with the help of a hash function which maps a k-mer to an integer, and it applies the classic B-tree algorithm to solve the problem of counting k-mers in DNA sequence. It proposes three measures to further improve the efficiency of the algorithm according to the feature of the counting problem. 【Key words】k-mer; DNA sequence; B-tree; Whole genome
What problem does this paper attempt to address?