Abstract:We introduce deterministic suffix-reading automata (DSA), a new automaton model over finite words. Transitions in a DSA are labeled with words. From a state, a DSA triggers an outgoing transition on seeing a word ending with the transition's label. Therefore, rather than moving along an input word letter by letter, a DSA can jump along blocks of letters, with each block ending in a suitable suffix. This feature allows DSAs to recognize regular languages more concisely, compared to DFAs. In this work, we focus on questions around finding a "minimal" DSA for a regular language. The number of states is not a faithful measure of the size of a DSA, since the transition-labels contain strings of arbitrary length. Hence, we consider total-size (number of states + number of edges + total length of transition-labels) as the size measure of DSAs.
We start by formally defining the model and providing a DSA-to-DFA conversion that allows to compare the expressiveness and succinctness of DSA with related automata models. Our main technical contribution is a method to derive DSAs from a given DFA: a DFA-to-DSA conversion. We make a surprising observation that the smallest DSA derived from the canonical DFA of a regular language L need not be a minimal DSA for L. This observation leads to a fundamental bottleneck in deriving a minimal DSA for a regular language. In fact, we prove that given a DFA and a number k, the problem of deciding if there exists an equivalent DSA of total-size at most k is NP-complete.
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: introduce a new automaton model - Deterministic Suffix - reading Automata (DSA), and explore how to construct the smallest DSA from a given Deterministic Finite Automaton (DFA). Specifically, the main research problems include:
1. **Definition and Formal Description**:
- Deterministic Suffix - reading Automata (DSA) is a new type of automaton model, and its transitions are marked by strings. Different from the traditional Deterministic Finite Automaton (DFA), DSA triggers a transition when it sees a word ending with a certain transition label.
- The size measurement criteria of DSA are defined, including the number of states, the number of edges, and the total length of transition labels.
2. **Expressive Power and Conciseness**:
- Prove that DSA can recognize regular languages and can represent some regular languages more concisely than DFA.
- Provide a conversion method from DFA to DSA, and analyze the expressive power of this conversion and its conciseness relative to other related automaton models.
3. **Minimization Problem**:
- Explore how to construct the smallest DSA from a given DFA. The research finds that the smallest DSA derived from a canonical DFA is not necessarily the smallest DSA, which reveals a fundamental bottleneck.
- Prove that given a DFA and an integer \( k \), the problem of deciding whether there exists an equivalent DSA with a total size not exceeding \( k \) is NP - complete.
4. **Technical Contributions**:
- Propose a method for constructing DSA from DFA, and ensure language equivalence by selecting appropriate subsets of states and adding transitions.
- Identify sufficient conditions for the derivation process to maintain language equivalence, especially regarding the concept of "suffix - compatible transitions".
### Formula Summary
- **Size Measurement**:
\[
|A| = |Q| + |\Delta| + \sum_{q \in Q} |Out(q)|
\]
where \( |Q| \) is the number of states, \( |\Delta| \) is the number of transitions, and \( |Out(q)| \) is the total length of all transition labels starting from state \( q \).
- **Complexity Relationship**:
\[
\frac{n_{\text{cmp}}^F}{2(1 + 2|\Sigma|)} \leq n_S \leq n_{\text{cmp}}^F
\]
where \( n_{\text{cmp}}^F \) is the size of the minimum complete DFA, \( n_S \) is the size of the minimum DSA, and \( |\Sigma| \) is the size of the alphabet.
### Conclusion
By introducing the DSA model, the paper provides a new method to represent and process regular languages, especially showing better conciseness when dealing with languages with large alphabets or dense patterns. However, finding the smallest DSA is still a challenging problem, especially there are certain limitations in the process of deriving the smallest DSA from a canonical DFA.