Evaluation of the morphological rules for the Tenyidie language: a low-resource language
Teisovi Angami,Mimi Kevichüsa-Ezung,Sanasam Ranbir Singh,Themrichon Tuithung
DOI: https://doi.org/10.1007/s10579-024-09788-y
2024-11-28
Language Resources and Evaluation
Abstract:The Tenyidie language, a.k.a Angami language, is a low-resource language belonging to the Tibeto-Burman Language family, which is spoken by the Tenyimia Community and is considered a major language in Nagaland in the north-eastern part of India. Tenyidie is tonal, SOV, and highly agglutinative in its linguistics characteristics. Among the Natural Language Processing (NLP) tasks, part-of-speech (POS) tagging is one of the primary tasks that is used in building many other NLP tasks, such as dependency parsing, named entity recognition, machine translation, etc. The main aim of this paper is to evaluate the morphological rules in the Tenyidie Language by building a morphological rule-based POS tagger. In this work, the morphological rules have been evaluated on 158,403 annotated tokens in Tenyidie. To the best of the authors knowledge, there is no reported work on the evaluation of the morphological rules for the Tenyidie Language. The main contributions of this research are the evaluation of the existing morphological rules in the Tenyidie Language by building a morphological rule-based POS tagger, and the creation of 158,403 tokens annotated dataset for evaluating the morphological rules. In addition, we have introduced some new morphological rules.
computer science, interdisciplinary applications