Substring Compression Variations and LZ78-Derivates

Dominik Köppl
DOI: https://doi.org/10.1109/DCC58796.2024.00021
2024-09-23
Abstract:We propose algorithms computing the semi-greedy Lempel-Ziv 78 (LZ78), the Lempel-Ziv Double (LZD), and the Lempel-Ziv-Miller-Wegman (LZMW) factorizations in linear time for integer alphabets. For LZD and LZMW, we additionally propose data structures that can be constructed in linear time, which can solve the substring compression problems for these factorizations in time linear in the output size. For substring compression, we give results for lexparse and closed factorizations.
Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the optimization regarding the **Substring Compression Problem**. Specifically, the author proposes a series of algorithms and data structures to improve the substring compression performance of Lempel - Ziv 78 (LZ78) and its variants (such as LZD and LZMW). The following are the main problems addressed in this paper: 1. **Factorization Computation in Linear Time**: - Propose algorithms to compute semi - greedy LZ78, LZD, and LZMW factorizations in linear time on an integer alphabet. - For LZD and LZMW, propose data structures that can solve the substring compression problem in linear time with respect to the output size. 2. **Substring Compression Problem**: - The substring compression problem refers to pre - processing a given text T so that the compressed version of any substring T[i..j] in T can be efficiently computed. - The author studies this problem for variants of LZ78 (such as LZD and LZMW) and proposes corresponding solutions. 3. **Space and Query Time Optimization**: - The goal is to be able to answer queries in linear time with respect to the output size under a space complexity of O(nlgn) bits. - Avoid using a space complexity of Ω(n²), thus handling large - scale data more effectively. 4. **Research on Specific Factorization Variants**: - Study three variants of LZ78: flexible parsing, LZMW, and LZD. - Conduct in - depth analysis on these variants and propose efficient construction algorithms and substring compression methods. ### Specific Contributions - **Substring Compression of LZD and LZMW**: - Propose algorithms to answer substring compression queries of LZD and LZMW in O(z) time within O(nlgn) - bit space. - These data structures can be constructed in linear time, which is the first deterministic linear - time method for computing LZD and LZMW. - **Flexible Parsing**: - Propose two different - style LZ78 flexible parsing algorithms and show how to use AC automata and suffix trees to compute these two parsings. - **Application of Suffix Trees**: - Use suffix trees to compute the parsings of LZD and LZMW, further optimizing the computational efficiency. ### Summary The main objective of this paper is to improve the computational efficiency and space utilization of the substring compression problem by improving the factorization algorithms and data structures of LZ78 and its variants. In particular, for LZD and LZMW, the author proposes new linear - time algorithms and efficient data structures, significantly improving the performance of substring compression.