Non-erasing Chomsky-Sch{ü}tzenberger theorem with grammar-independent alphabet

Stefano Crespi Reghizzi,Pierluigi San Pietro
DOI: https://doi.org/10.48550/arXiv.1805.04003
2018-05-10
Abstract:The famous theorem by Chomsky and Schützenberger (CST) says that every context-free language $L$ over an alphabet $\Sigma$ is representable as $h(D \cap R)$, where $D$ is a Dyck language over a set $\Omega$ of brackets, $R$ is a local language and $h$ is an alphabetic homomorphism that erases unboundedly many symbols. Berstel found that the number of erasures can be linearly limited if the grammar is in Greibach normal form; Berstel and Boasson (and later, independently, Okhotin) proved a non-erasing variant of CST for grammars in Double Greibach Normal Form. In all these CST statements, however, the size of the Dyck alphabet $\Omega$ depends on the grammar size for $L$. In the Stanley variant of the CST, $|\Omega|$ only depends on $|\Sigma|$ and not on the grammar, but the homomorphism erases many more symbols than in the other versions of CST; also, the regular language $R$ is strictly locally testable but not local. We prove a new version of CST which combines both features of being non-erasing and of using a grammar-independent alphabet. In our construction, $|\Omega|$ is polynomial in $|\Sigma|$, namely $O(|\Sigma|^{46})$, and the regular language $R$ is strictly locally testable. Using a recent generalization of Medvedev's homomorphic characterization of regular languages, we prove that the degree in the polynomial dependence of $|\Omega|$ on $|\Sigma|$ may be reduced to just 2 in the case of linear grammars in Double Greibach Normal Form.
Formal Languages and Automata Theory
What problem does this paper attempt to address?