Chinese Named Entity Recognition with a Sequence Labeling Approach: Based on Characters, or Based on Words?

Zhangxun Liu,Conghui Zhu,Tiejun Zhao
DOI: https://doi.org/10.1007/978-3-642-14932-0_78
2010-01-01
Abstract:Named Entity Recognition (NER), an important problem of Natural Language Processing, is the basis for other applications, such as Data Mining and Relation Extraction. With a sequence labeling approach, this paper wants to answer which kind of tokens that should be taken as the graininess in NER task, characters or words. Meanwhile, we use not only local context features within a sentence, but also global knowledge features extracting from other occurrences of each word in the whole corpus. The results show that without the global features the person names and the location names have good result based on characters, but the organization names are more suitable based on words. When global features are added, the performance of based on words improved significantly.
What problem does this paper attempt to address?