Dynamic Suffix Array in Optimal Compressed Space
Takaaki Nishimoto,Yasuo Tabei
2024-07-12
Abstract:Big data, encompassing extensive datasets, has seen rapid expansion, notably with a considerable portion being textual data, including strings and texts. Simple compression methods and standard data structures prove inadequate for processing these datasets, as they require decompression for usage or consume extensive memory resources. Consequently, this motivation has led to the development of compressed data structures that support various queries for a given string, typically operating in polylogarithmic time and utilizing compressed space proportional to the string's length. Notably, the suffix array (SA) query is a critical component in implementing a suffix tree, which has a broad spectrum of applications.
A line of research has been conducted on (especially, static) compressed data structures that support the SA query. A common finding from most of the studies is the suboptimal space efficiency of existing compressed data structures. Kociumaka, Navarro, and Prezza [IEEE Trans. Inf. Theory 2023] have made a significant contribution by introducing an asymptotically minimal space requirement, $O\left(\delta \log\frac{n\log\sigma}{\delta\log n} \log n \right)$ bits ($\delta$-optimal space), sufficient to represent any string of length $n$, with an alphabet size of $\sigma$, and substring complexity $\delta$, serving as a measure of repetitiveness. More recently, Kempa and Kociumaka [FOCS 2023] presented $\delta$-SA, a compressed data structure supporting SA queries in $\delta$-optimal space. However, the data structures introduced thus far are static.
We present the first dynamic compressed data structure that supports the SA query and update in polylogarithmic time and $\delta$-optimal space. More precisely, it can answer SA queries and perform updates in $O(\log^7 n)$ and expected $O(\log^8 n)$ time, respectively, using an expected $\delta$-optimal space.
Data Structures and Algorithms