Build a Chinese Treebank as the test suite for Chinese parser

Zhou Qiang,Sun Maosong
2008-01-01
Abstract:This paper will introduce our current work to build a Chinese treebank that can be used as a test suite for Chinese parser. The treebank will consist of 10,000 Chinese sentences extracted from a Chinese balanced corpus with about 2,000,000 Chinese characters. The corpus has already been annotated with correct segmentation and Part-Of-Speech(POS) information. The following issues will be discussed in the paper : the survey of the balanced corpus, the strategies and methods for sampling the treebank sentences, the processing schemes and tools for treebank construction.
What problem does this paper attempt to address?