Exploring Multiple Features for POS Guessing of Chinese Unknown Words with Maximum Entropy Models

Qi Wang,Yu He,Guohong Fu
2012-01-01
Abstract:A key requirement for high-performing part-ofspeech (POS) tagging systems is capable of predicting with accuracy the POS categories of unknown words in openended text. However, the topic of unknown word guessing has not been much explored for Chinese so far. This is because Chinese plain text contains very little explicit clues for unknown word guessing. In this paper, we attempt to explore morphological features for POS prediction of Chinese unknown words. To this end, we first take morphemes as the basic units in Chinese word formation, and then decompose each unknown word in Chinese sentences into a sequence of morphemes associated with their morphological position patterns. Finally, we incorporate different word-internal morphological clues with contextual information under the framework of maximum entropy modeling. Experimental results show that our method performs better in unknown word guessing than most of the best systems for POS tagging closed tracks at the fourth ACL-SIGHAN bakeoff, demonstrating that morphological features are of great value to POS guessing of Chinese unknown words.
What problem does this paper attempt to address?