Parsing Penn Chinese Treebank Based on Lexicalized Model

Hailong Cao,Yujie Zhang,Hitoshi Isahara
2007-01-01
Abstract:Syntactic parsing is one of the most important technologies of natural language processing. The development of Penn Chinese Treebank (CTB) spurred the research of Chinese parsing. This paper describes a lexicalized statistical Chinese parser. First, a lexicalized model based on hidden Markov model is proposed for part of speech tagging. Second, a well-known lexicalized model i.e. the head-driven model is adapted to parse the automatically POS tagged Chinese sentences. The construction of the parser is described, and the effects of details that can make great difference in the parsing performance are analyzed. On sentences of length less than 100 words, the parser performances at 80.08% precision and 78.45% recall on, surpassing the best published results.
What problem does this paper attempt to address?