Automatic Learning and Refinement Algorithm for Chinese Base Chunk Rules

ZHOU Qiang
DOI: https://doi.org/10.3321/j.issn:1000-0054.2008.01.024
2008-01-01
Abstract:A method is presented to automatically learn and refine Chinese base chunk rules, using a large annotated corpus and a lexical knowledge base. After extracting all possible parts-of-speech-based rules from the annotated corpus, the system first prunes most of useless rules, and expands some low reliability rules with hierarchical knowledge from the internal lexical relationships and external contextual restrictions. The system then refines the rules into structural rules with stronger descriptive capabilities. A confidence score computation is used to evaluate rule reliability during the learning procedure, with an expected accuracy index to evaluate the descriptive capabilities of the refined rule base. Test results indicate that the algorithm can acquire about 16% of the useful expanded rules to cover 93% of the annotated positive examples and can improve the expected accuracy from 51% to 81%.
What problem does this paper attempt to address?