Edit Distance for Pushdown Automata

Krishnendu Chatterjee,Thomas A. Henzinger,Rasmus Ibsen-Jensen,Jan Otop
DOI: https://doi.org/10.23638/LMCS-13%283%3A23%292017
2017-09-22
Abstract:The edit distance between two words $w_1, w_2$ is the minimal number of word operations (letter insertions, deletions, and substitutions) necessary to transform $w_1$ to $w_2$. The edit distance generalizes to languages $\mathcal{L}_1, \mathcal{L}_2$, where the edit distance from $\mathcal{L}_1$ to $\mathcal{L}_2$ is the minimal number $k$ such that for every word from $\mathcal{L}_1$ there exists a word in $\mathcal{L}_2$ with edit distance at most $k$. We study the edit distance computation problem between pushdown automata and their subclasses. The problem of computing edit distance to a pushdown automaton is undecidable, and in practice, the interesting question is to compute the edit distance from a pushdown automaton (the implementation, a standard model for programs with recursion) to a regular language (the specification). In this work, we present a complete picture of decidability and complexity for the following problems: (1)~deciding whether, for a given threshold $k$, the edit distance from a pushdown automaton to a finite automaton is at most $k$, and (2)~deciding whether the edit distance from a pushdown automaton to a finite automaton is finite.
Formal Languages and Automata Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about the computational complexity of **edit distance** between different types of automata. Specifically, the paper focuses on the edit distance calculation problems from **push - down automata (PDA)** to **finite automata (DFA/NFA)** and from **finite automata** to **push - down automata**. Edit distance refers to the minimum number of operations (inserting, deleting or replacing characters) required to convert a string in one language into a string in another language. ### Main research questions: 1. **Threshold Edit Distance (TED) problem**: - Given two automata \(A_1\) and \(A_2\) and an integer threshold \(k\), determine whether the edit distance from \(A_1\) to \(A_2\) is less than or equal to \(k\). - The paper studies the complexity of the TED problem when \(A_1\) is a push - down automaton (DPDA or PDA) and \(A_2\) is a finite automaton (DFA or NFA). 2. **Finite Edit Distance (FED) problem**: - Determine whether the edit distance from \(A_1\) to \(A_2\) is finite. - The paper also studies the complexity of the FED problem when \(A_1\) is a push - down automaton (DPDA or PDA) and \(A_2\) is a finite automaton (DFA or NFA). ### Research background: - Edit distance is a concept widely used in the field of computer science, especially in error - correcting codes, natural language processing and computational biology. - In the field of verification, the traditional language inclusion problem (i.e., determining whether the language of one automaton includes the language of another automaton) can be quantitatively extended through the edit distance problem. ### Main contributions: 1. **Complexity of the TED problem**: - It is proved that when the source language is given by DPDA or PDA and the target language is given by DFA or NFA, the TED problem is ExpTime - complete. - A matching upper bound and a lower bound are provided to prove the complexity of this problem. 2. **Complexity of the FED problem**: - It is proved that when the source language is given by DPDA or PDA and the target language is given by DFA, the FED problem is coNP - complete. - When the target language is given by NFA, the FED problem is ExpTime - complete. ### Related work: - The paper also reviews previous research on edit distance, including edit distance algorithms for words, regular languages, timed automata and straight - line programs. Through these studies, the paper provides a complete complexity picture, fills the gaps in existing research and provides a theoretical basis for practical applications.