Active Learning for Chinese Dependency Parsing

CHE Wanxiang,ZHANG Meishan,LIU Ting
DOI: https://doi.org/10.3969/j.issn.1003-0077.2012.02.004
2012-01-01
Abstract:It is necessary to have a large annotated Treebank to build a statistical dependency parser.Acquisition of such a Treebank is time consuming,tedious and expensive.This paper presents a method to reduce this demand via active learning,which selects the most uncertain samples for annotation instead of the whole training corpus.Experiments are carried out on the HIT-CIR-CDT,our results show that the parsing accuracy rises about 0.8 percent by active learning when using the same amount of training samples.In other words,for about the same parsing accuracy,we only need to annotate 70% of the samples as compared to the usual random selection method.
What problem does this paper attempt to address?