Parsing TCT with Split Conjunction Categories.

Dongchen Li,Xihong Wu
2012-01-01
Abstract:We demonstrate that an unlexicalized PCFG with refined conjunction categories can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar and reflect the Chinese idiosyncratic grammatical property. Indeed, its performance is the best result in the 3nd Chinese Parsing Evaluation of single model. This result has showed that refine the function words to represent Chinese subcat frame is a good method. An unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.
What problem does this paper attempt to address?