An Efficient Inductive Unsupervised Semantic Tagger

K T Lua
DOI: https://doi.org/10.48550/arXiv.cmp-lg/9606012
1996-06-11
Abstract:We report our development of a simple but fast and efficient inductive unsupervised semantic tagger for Chinese words. A POS hand-tagged corpus of 348,000 words is used. The corpus is being tagged in two steps. First, possible semantic tags are selected from a semantic dictionary(Tong Yi Ci Ci Lin), the POS and the conditional probability of semantic from POS, i.e., P(S|P). The final semantic tag is then assigned by considering the semantic tags before and after the current word and the semantic-word conditional probability P(S|W) derived from the first step. Semantic bigram probabilities P(S|S) are used in the second step. Final manual checking shows that this simple but efficient algorithm has a hit rate of 91%. The tagger tags 142 words per second, using a 120 MHz Pentium running FOXPRO. It runs about 2.3 times faster than a Viterbi tagger.
Computation and Language
What problem does this paper attempt to address?