Two Approaches to Diachronic Normalization of Polish Texts
Kacper Dudzic, Filip Graliński, Krzysztof Jassem, Marek Kubis, Piotr Wierzchoń
2024-02-03
Abstract:This paper discusses two approaches to the diachronic normalization of Polish
texts: a rule-based solution that relies on a set of handcrafted patterns, and
a neural normalization model based on the text-to-text transfer transformer
architecture. The training and evaluation data prepared for the task are
discussed in detail, along with experiments conducted to compare the proposed
normalization solutions. A quantitative and qualitative analysis is made. It is
shown that at the current stage of inquiry into the problem, the rule-based
solution outperforms the neural one on 3 out of 4 variants of the prepared
dataset, although in practice both approaches have distinct advantages and
disadvantages.
Computation and Language
What problem does this paper attempt to address?