A Multilingual Language Processing Tool for Uyghur, Kazak and Kirghiz

Mijit Ablimit,Sardar Parhat,Askar Hamdulla,Thomas Fang Zheng
DOI: https://doi.org/10.1109/apsipa.2017.8282131
2017-01-01
Abstract:Natural language processing for less popular languages is difficult, partly due to the high variations in the writing form. On the other hand, many minority languages in the same region share similar properties and can be processed in a similar way. This paper publishes an integrated multilingual language processing tool. Our aim is to provide an open, free and standard toolkit for minority language processing tasks, by a uniform user interface to support multiple languages. The present implementation supports Uyghur, Kazak, Kirghiz, three major minority languages in the Western China, and our focus was put on phonetic and morphological analysis. For the phonetic analysis, we build a multilingual parallel phoneme list, with similar phonemes grouped and character codes standardized. A multilingual syllable analyzer is also developed to detect spelling mistakes, and extract irregular spelling. For the morphological analysis, we build a multilingual morpheme segmentation tool that can extract morphemes by statistical analysis. This toolkit is extendable in terms of both functions and languages.
What problem does this paper attempt to address?