A Study Oil Constituent-to-Dependency Conversion

李正华,车万翔,刘挺
DOI: https://doi.org/10.3969/j.issn.1003-0077.2008.06.003
2008-01-01
Abstract:The progress of Chinese dependency treebank construction has fallen behind other languages, such as English, in terms of scale and quality. Building a large scale treebank needs a lot of human and material resources. Meanwhile, it is very difficult to guarantee the quality of the treebank. In this paper, we explore a new method which combines rule-based method and statistical-based method to convert a constituent treebank named Penn Chinese Treebank to a dependency treebank which follows the annatation standard of HIT Chinese Dependency Treebank (HIT-IR-CDT). We increase the size of training data by adding converted treebank into HIT-IR-CDT and re-train the dependency parser. Experiments show that small addition of converted treebank can improve the performance of dependency parser, while large addition will bring it down. Through detailed analysis, we believe that convertion of constituent-to-dependency treebank, being a method of improving performance of dependency parser by utilizing different treebanks, still needs in-depth research.
What problem does this paper attempt to address?