Chinese Named Entity Recognition Using a Morpheme-Based Chunking Tagger

Guohong Fu
DOI: https://doi.org/10.1109/IALP.2009.68
2009-01-01
Abstract:Most previous studies formalize Chinese named entity recognition (NER) as a chunking task with either characters or lexicon words as the basic tokens for chunking. However, it is difficult under this formulation to explore lexical information for NER. Furthermore, traditional NER chunking systems usually employ an exhaustive strategy for entity candidate generation, obviously resulting in efficiency loss during entity decoding. In this paper we propose a morpheme-based chunking framework for Chinese NER and implement an efficient three-stage tagger using the pipeline strategy. To tackle the problem of out-of-vocabulary words and to more effectively explore lexical cues for NER as well, we distinguish named entities from common words and choose morphemes as the basic tokens for entity chunking. To reduce the space of entity candidates and improve the efficiency of entity decoding, we employ internal entity formation pattern rules during entity candidate generation. Our experiments on different datasets show that our system can greatly improve NER efficiency without much degradation of performance.
What problem does this paper attempt to address?