Bounded Edit Distance: Optimal Static and Dynamic Algorithms for Small Integer Weights

Egor Gorbachev,Tomasz Kociumaka
2024-08-09
Abstract:The edit distance of two strings is the minimum number of insertions, deletions, and substitutions needed to transform one string into the other. The textbook algorithm determines the edit distance of length-$n$ strings in $O(n^2)$ time, which is optimal up to subpolynomial factors under Orthogonal Vectors Hypothesis. In the bounded version of the problem, parameterized by the edit distance $k$, the algorithm of Landau and Vishkin [JCSS'88] achieves $O(n+k^2)$ time, which is optimal as a function of $n$ and $k$. The dynamic version of the problem asks to maintain the edit distance of two strings that change dynamically, with each update modeled as an edit. A folklore approach supports updates in $\tilde O(k^2)$ time, where $\tilde O(\cdot)$ hides polylogarithmic factors. Recently, Charalampopoulos, Kociumaka, and Mozes [CPM'20] showed an algorithm with update time $\tilde O(n)$, which is optimal under OVH in terms of $n$. The update time of $\tilde O(\min\{n,k^2\})$ raised an exciting open question of whether $\tilde O(k)$ is possible; we answer it affirmatively. Our solution relies on tools originating from weighted edit distance, where the weight of each edit depends on the edit type and the characters involved. The textbook algorithm supports weights, but the Landau-Vishkin approach does not, and a simple $O(nk)$-time procedure long remained the fastest for bounded weighted edit distance. Only recently, Das et al. [STOC'23] provided an $O(n+k^5)$-time algorithm, whereas Cassis, Kociumaka, and Wellnitz [FOCS'23] presented an $\tilde O(n+\sqrt{nk^3})$-time solution and a matching conditional lower bound. In this paper, we show that, for integer edit weights between $0$ and $W$, weighted edit distance can be computed in $\tilde O(n+Wk^2)$ time and maintained dynamically in $\tilde O(W^2k)$ time per update. Our static algorithm can also be implemented in $\tilde O(n+k^{2.5})$ time.
Data Structures and Algorithms
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Dynamic Edit Distance Problem**: - **Background**: The edit distance (Levenshtein distance) refers to the minimum number of character insertions, deletions, and substitutions required to convert one string into another. The classic dynamic programming algorithm can calculate the edit distance of strings of length \(n\) in \(O(n^2)\) time. However, for the dynamic edit distance problem, that is, when the strings change dynamically, the edit distance needs to be maintained after each update. - **Existing Methods**: Early methods combined Landau and Vishkin's algorithm and a dynamic string implementation that supports efficient substring equivalence queries, and could achieve a worst - case update time of \(O(\tilde{k}^2)\). Recent research has achieved an update time of \(O(\tilde{n})\) through Tiskin's framework, but whether it can be further optimized to \(O(\tilde{k})\) remains an open question. - **Paper Contribution**: This paper proposes a deterministic dynamic algorithm that can complete each update in \(O(k\log^4 n)\) time and can output the optimal edit sequence. This is a positive answer to the \(O(\tilde{k})\) update time. 2. **Weighted Edit Distance Problem**: - **Background**: In practical applications, it is usually necessary to consider the weighted edit distance, where the cost of each edit operation depends on the edit type and the characters involved. Although the classic dynamic programming algorithm can handle arbitrary weights, for a long time, the fastest known algorithm for the weighted edit distance problem is still \(O(nk)\). - **Existing Methods**: Recent research has reduced the running time to \(O(n + k^5)\) and \(O(\tilde{n}+\sqrt{nk^3})\) by improving the algorithm. These results are already optimal in some cases. - **Paper Contribution**: This paper considers the special case where the integer weights range from 0 to \(W\), and proposes an algorithm to calculate the static weighted edit distance in \(O(\tilde{n}+Wk^2)\) time, and in the dynamic case, each update can be completed in \(O(W^2k\log^4 n)\) time. In particular, when \(W\) is a constant, the dynamic update time can reach \(O(\tilde{k})\). ### Summary of Main Contributions - **Dynamic Edit Distance**: Proposed a deterministic dynamic algorithm that can complete each update in \(O(k\log^4 n)\) time and can output the optimal edit sequence. - **Weighted Edit Distance**: Proposed an algorithm to calculate the static weighted edit distance in \(O(\tilde{n}+Wk^2)\) time, and in the dynamic case, each update can be completed in \(O(W^2k\log^4 n)\) time. In particular, when \(W\) is a constant, the dynamic update time can reach \(O(\tilde{k})\). ### Technical Overview - **Basic Concepts**: The paper uses alignment graphs to explain the edit distance problem. An alignment graph is a directed grid graph, and its vertex set is \([0..|X|]\times[0..|Y|]\), and the edges represent insertion, deletion, and substitution operations. - **Dynamic Algorithm**: Based on the algorithms of Charalampopoulos, Kociumaka, and Mozes, the paper supports the update of dynamic strings through hierarchical decomposition of strings and construction of boundary distance matrices. In particular, through the concept of self - edit distance, the problem is simplified to instances that satisfy the self - edit distance condition, thus breaking through the previous update time barriers. ### Open Problems - **Update Time in the Unweighted Case**: How to further reduce the factor overhead of \(O(\log^4 n)\). - **Dependence in the Weighted Case**: How to reduce the dependence on the maximum weight \(W\), especially in the dynamic update time.