Memory-Aware BWT by segmenting sequences to support subsequence search

Jiaying Wang,Xiaochun Yang,Bin Wang,Huaijie Zhu
DOI: https://doi.org/10.1007/978-3-642-29253-8_7
2012-01-01
Abstract:Nowadays, Burrows-Wheeler transform (BWT) has been receiving significant attentions in academia for addressing subsequence matching problems. Although BWT is a typical technique to transform a sequence into a new sequence that is "easy to compress", it can also be extended as a kind of full text index techniques. Traditional BWT requires nlogn+nlogσ bits to build index for a sequence with n characters, where σ is size of the alphabet. Building BWT index for a long sequence on PCs with limited memory is a great challenge. In order to solve the problem, we propose a novel variation of BWT index named S-BWT, which separates the source sequence into segments. It can reduce the memory cost to n(logσ+logn−logk )/k bits, where k is the number of segments. However, querying on each segment separately using the existing approaches has to undertake the risk of losing some significant results. In this paper, we propose two query methods based on S-BWT and guarantee to find all subsequence occurrences. Our methods can not only require small memory space, but also are faster than the state-of-art BWT backward search method for long sequence.
What problem does this paper attempt to address?