A new lightweight index SUA for biological sequence analysis

Di Wang,Guoren Wang,Baichen Chen,Qingquan Wu,Bin Wang,Donghong Han
DOI: https://doi.org/10.3321/j.issn:1671-4512.2005.z1.059
2005-01-01
Abstract:Searching for repetitions is an important topic in bio-sequence analysis but the bottleneck of current indices used for it such as suffix tree is much too huge space consumption. Succeeding unit array (SUA), a lightweight index structure, is proposed through the analysis of repetitions in the DNA sequences in order to solve the bottleneck. It is constructed based on Radix sorting. Furthermore, SUA is suitable for multi-sequences analysis. The theoretical analysis shows the advantage of SUA in space consumption. Given a sequence of length n, the space consumption of SUA is only about 5 n in the experiments. Meanwhile, the construction is faster than other indices such as suffix tree.
What problem does this paper attempt to address?