A study of universal morphological analysis using morpheme-based, holistic, and neural approaches under various data size conditions
Lepage, Yves
DOI: https://doi.org/10.1007/s10472-024-09944-8
IF: 1.019
2024-05-12
Annals of Mathematics and Artificial Intelligence
Abstract:We perform a study on the universal morphological analysis task: given a word form, generate the lemma (lemmatisation) and its corresponding morphosyntactic descriptions (MSD analysis). Experiments are carried out on the SIGMORPHON 2018 Shared Task: Morphological Reinflection Task dataset which consists of more than 100 different languages with various morphological richness under three different data size conditions: low, medium and high. We consider three main approaches: morpheme-based (eager learning), holistic (lazy learning), and neural (eager learning). Performance is evaluated on the two subtasks of lemmatisation and MSD analysis. For the lemmatisation subtask, under all three data sizes, experimental results show that the holistic approach predicted more accurate lemmata, while the morpheme-based approach produced lemmata closer to the answers when it produces the wrong answers. For the MSD analysis subtask, under all three data sizes, the holistic approach achieves higher recall, while the morpheme-based approach is more precise. However, the trade-off between precision and recall of the two systems leads to a very similar overall F1 score. On the whole, neural approaches suffer under low resource conditions, but they achieve the best performance in comparison to the other approaches when the size of the training data increases.
computer science, artificial intelligence,mathematics, applied