Corpus linguistics and generative AI tools in term extraction: a case of Kashubian – a low-resource language

Marek Łukasik,Pomeranian University in Słupsk,
DOI: https://doi.org/10.32612/uw.25449354.2023.4.pp.34-45
2023-12-18
Applied Linguistics Papers
Abstract:Electronic corpora have been an indispensable resource in a variety of language studies, including linguistics, lexicography or terminology. Provided that they are compiled in a systematic manner, such text collections can provide high quality data that can be readily used in a specific study or can be directly applied to a specific practical project. However, the creation of a usable corpus depends on the availability and the quality of source texts and the tools that are used for its processing. Another factor that often plays a significant role in successful ad hoc applications of corpora is their immediate accessibility. Recent developments in generative artificial intelligence (GenAI) have rendered the idea of instantaneous access to language data a feasible possibility. This paper discusses the results of a study into the feasibility of applying modern corpus and GenAI tools in the extraction of biological terminology in Kashubian, a regional language spoken in the north-central part of Poland (Kashubia). The overarching goal was to identify modern and effective tools that could be used by terminologists, lexicographers, translators, and teachers of Kashubian.
What problem does this paper attempt to address?