Language Edit Distance & Scored Parsing: Faster Algorithms & Connection to Fundamental Graph Problems

Tomasz Kociumaka,Barna Saha
2024-10-24
Abstract:Given a context free language $\mathcal{L(G)}$ over alphabet $\Sigma$ and a string $s \in \Sigma^*$, {\em the language edit distance} problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert $s$ into a valid member of $\mathcal{L(G)}$. The well-known dynamic programming algorithm solves this problem in $O(n^3)$ time (ignoring grammar size) where $n$ is the string length [Aho, Peterson 1972, Myers 1985]. Despite its numerous applications, to date there exists no algorithm that computes exact or approximate language edit distance problem in true subcubic time. In this paper we give the first such algorithm that approximates language edit distance in subcubic time. For any arbitrary $\epsilon > 0$, our algorithm runs in $\tilde{O}(\frac{n^{2.491}}{\epsilon^2})$ time and returns an estimate within a multiplicative approximation factor of $(1+\epsilon)$. Moreover, an additive $\epsilon n$ approximation can be computed in $O(\frac{n^2}{\epsilon^{0.825}})$ time. To complement our upper bound results, we show that exact computation of language edit distance with insertion-only edits in truly subcubic time will imply a truly subcubic algorithm for all-pairs shortest paths which is a long-standing open question in computer science.
Data Structures and Algorithms,Formal Languages and Automata Theory
What problem does this paper attempt to address?