Automatic dependency parsing of Estonian: what linguistic features to include?

Sven Laur,Siim Orasmaa,Sandra Eiche,Dage Särg
DOI: https://doi.org/10.1007/s10579-024-09779-z
2024-10-11
Language Resources and Evaluation
Abstract:In this paper, we investigate the role of linguistically motivated input features and pre-processing in advancing dependency parsing of Estonian. In particular, we focus on parsers that take morphological features as explicit inputs, and investigate the effect of a) training data size, b) the choice of lexical and morphological features, and c) clausal syntactic patterns on developing such parsers. While our work indicates that further advancements through a naive increase of training data are hard to obtain, we still confirm the high utility of automatically generated morphological features in parser's input. Our ablation studies indicate that the knowledge about subcategorisation constructions is crucial for parsing, and a targeted search for subcategorisation constructions may lead to more straightforward and effective input features. We also show that decomposing sentences into simpler structures via clausal patterns can lead to performance gains and note that the distribution of clausal subtrees should be considered while increasing training data.
computer science, interdisciplinary applications
What problem does this paper attempt to address?