A Report on Achieving Complete Regular-Expression Matching using Mealy Machines

Ricardo Almeida
DOI: https://doi.org/10.48550/arXiv.2206.04944
2022-06-10
Formal Languages and Automata Theory
Abstract:While regexp matching is a powerful mechanism for finding patterns in data streams, regexp engines in general only find matches that do not overlap. Moreover, different forms of nondeterministic exploration, where symbols read are processed more than once, are often used, which can be costly in real-time matching. We present an algorithm that constructs from any regexp a Mealy machine that finds all matches and while reading each input symbol only once. The machine computed can also detect and distinguish different patterns or sub-patterns inside patterns. Additionally, we show how to compute a minimal Mealy machine via a variation of DFA minimization, by formalizing Mealy machines in terms of regular languages.
What problem does this paper attempt to address?