Space-efficient conversions from SLPs

Travis Gagie,Adrián Goga,Artur Jeż,Gonzalo Navarro
2023-10-10
Abstract:We give algorithms that, given a straight-line program (SLP) with $g$ rules that generates (only) a text $T [1..n]$, builds within $O(g)$ space the Lempel-Ziv (LZ) parse of $T$ (of $z$ phrases) in time $O(n\log^2 n)$ or in time $O(gz\log^2(n/z))$. We also show how to build a locally consistent grammar (LCG) of optimal size $g_{lc} = O(\delta\log\frac{n}{\delta})$ from the SLP within $O(g+g_{lc})$ space and in $O(n\log g)$ time, where $\delta$ is the substring complexity measure of $T$. Finally, we show how to build the LZ parse of $T$ from such a LCG within $O(g_{lc})$ space and in time $O(z\log^2 n \log^2(n/z))$. All our results hold with high probability.
Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform conversions between different compression formats in terms of space efficiency, especially when dealing with large - scale and highly repetitive text collections. Specifically, the paper proposes several algorithms that can construct the Lempel - Ziv (LZ) parsing, local consistent grammar (LCG) and other compressed representations of text \(T\) given a straight - line program (SLP), while maintaining a low space complexity. ### Main problems 1. **Compression conversion from SLP to LZ parsing**: - An algorithm that completes the conversion in \(O(n \log^2 n)\) time with a space complexity of \(O(g)\) is proposed. - A fully compressed conversion method is further proposed with a time complexity of \(O(gz \log^2(n/z))\), where \(g\) is the size of the SLP and \(z\) is the number of phrases in the LZ parsing. 2. **Compression conversion from SLP to a specific LCG**: - An algorithm for constructing a specific LCG in \(O(n \log g_{\text{lc}})\) time is proposed, where \(g_{\text{lc}} = O(\delta \log n / \delta)\) is the optimal size of the LCG and \(\delta\) is a compression measure based on substring complexity. 3. **Fully compressed conversion from LCG to LZ parsing**: - An algorithm that completes the conversion in \(O(z \log^2 n \log^2(n/z))\) time for certain types of LCG is proposed. ### Background and motivation With the emergence of large - scale and highly repetitive text collections, such as genomic data, it has become increasingly important to maintain the compressed form of these collections. This requires the ability to directly perform various computations on compressed data without decompressing it, such as text search and mining. Different compression formats show different advantages in different tasks, so it is very important to be able to convert efficiently between these formats. ### Main contributions 1. **Compression conversion from SLP to LZ parsing**: - The conversion is completed in \(O(n \log^2 n)\) time with a space complexity of \(O(g)\). - The fully compressed conversion is completed in \(O(gz \log^2(n/z))\) time. 2. **Compression conversion from SLP to LCG**: - A specific LCG is constructed in \(O(n \log g_{\text{lc}})\) time with a size of \(O(\delta \log n / \delta)\). 3. **Fully compressed conversion from LCG to LZ parsing**: - The conversion is completed in \(O(z \log^2 n \log^2(n/z))\) time. These algorithms are not only of great theoretical significance but also can significantly improve the efficiency of processing large - scale data in practical applications.