Mongolian Part-of-speech Tagging Approach Based on Conditional Random Fields

应玉龙,李淼,乌达巴拉,朱海
DOI: https://doi.org/10.3724/sp.j.1087.2010.02038
2010-01-01
Abstract:It is necessary to tag both stem and affix in the Mongolian part of speech tagging in order to save lots of syntax and semantic information of affix and to reduce the size of Mongolian dictionary.This paper presented a new approach of Mongolian part of speech tagging based on CRF.To take advantage of the ability of using arbitrary features as input in CRF the system exploited not only the contexts of words but also new statistical features adopted for mutual influence between the morphemes.The system was tested in the 38000 part-of-speech dataset provided by Inner Mongolia University.The closed test results show that POS tagging accuracy of the testing set reaches 96.65% outperforming the HMM-based model.
What problem does this paper attempt to address?