Abstract:Sparse suffix sorting is the problem of sorting $b=o(n)$ suffixes of a string of length $n$. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in text indexing, the existing algorithms have not been employed by practitioners. Arguably this is because there are no simple, direct, and efficient algorithms for sparse suffix array construction. We provide two new algorithms for constructing the sparse suffix and LCP arrays that are simultaneously simple, direct, small, and fast. In particular, our algorithms are: simple in the sense that they can be implemented using only basic data structures; direct in the sense that the output arrays are not a byproduct of constructing the sparse suffix tree or an LCE data structure; fast in the sense that they run in $\mathcal{O}(n\log b)$ time, in the worst case, or in $\mathcal{O}(n)$ time, when the total number of suffixes with an LCP value greater than $2^{\lfloor \log \frac{n}{b} \rfloor + 1}-1$ is in $\mathcal{O}(b/\log b)$, matching the time of the optimal yet much more complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et al., SODA 2020]; and small in the sense that they can be implemented using only $8b+o(b)$ machine words. Our algorithms are non-trivial space-efficient adaptations of the Monte Carlo algorithm by I et al. for constructing the sparse suffix tree in $\mathcal{O}(n\log b)$ time [STACS 2014]. We provide extensive experiments to justify our claims on simplicity and on efficiency.

A Note on the Longest Common Compatible Prefix Problem for Partial Words

Efficient Algorithms for Finding a Longest Common Increasing Subsequence

The colored longest common prefix array computed via sequential scans

Longest Common Extensions with Wildcards: Trade-off and Applications

Small-space encoding LCE data structure with constant-time queries

A note on the longest common substring with $k$-mismatches problem

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Space-time Trade-offs for the LCP Array of Wheeler DFAs

Computing the LCP Array of a Labeled Graph

On Abelian Longest Common Factor with and without RLE

Near-Optimal Quantum Algorithm for Finding the Longest Common Substring between Run-Length Encoded Strings

A Textbook Solution for Dynamic Strings

Efficient algorithms for the longest common subsequence in $k$-length substrings

Faster space-efficient STR-IC-LCS computation

A sublinear time quantum algorithm for longest common substring problem between run-length encoded strings

The Longest Wave Subsequence Problem: Generalizations of the Longest Increasing Subsequence Problem

Construction of Sparse Suffix Trees and LCE Indexes in Optimal Time and Space

The longest letter-duplicated subsequence and related problems

Faster Maximal Exact Matches with Lazy LCP Evaluation

Polynomial-time equivalences and refined algorithms for longest common subsequence variants