Research on Chinese Lexical Analysis System by Fusing Multiple Knowledge Sources
Wei JIANG,Xiao-Long WANG,Yi GUAN,Jian ZHAO
DOI: https://doi.org/10.3321/j.issn:0254-4164.2007.01.016
2007-01-01
Jisuanji Xuebao/Chinese Journal of Computers
Abstract:Chinese lexical analysis is the foundation task for most Chinese natural language processing. In this paper, word segmentation, POS tagging, named entity recognition and their relation are well discussed. Moreover, a pragmatic lexical analysis system based on mixed language models is presented, which adopts many models, such as n-gram, hidden Markov model, maximum entropy model, support vector machine and conditional random fields, they have good performance in the special sub-tasks. The Word Segmenter participated in the Second International Chinese Word Segmentation Bakeoff in 2005, and achieved 97.2% and 96.7% in terms of F-measure in MSR and PKU open test respectively. While the POS tagging and named entity recognition modules achieved 96.1% in precision and 88.6% in F-measure respectively in open test with the corpus that came from six-month corpora of Chinese Peoples' Daily.