A Divide-Conquer Strategy for Both English and Chinese Text Chunking

Ying-Hong Liang,Ni-Hong Wang,Zhao-wen Qiu,Yin Chen,Tie-jun Zhao
DOI: https://doi.org/10.1109/alpit.2007.36
2007-01-01
Abstract:The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, a divide-conquer strategy is proposed and applied in the identification of English phrases. And then, this strategy is rapid transplanted to Chinese text chunking.This strategy divides the task of chunking into several sub-tasks according to sensitive features of each phrase and identifies different phrases in parallel. Then, a two-stage decreasing conflict strategy is used to synthesize each sub-task's answer,where the main features are: one, each phrase uses its own sensitive features; two, avoidance of data sparseness. Through testing on public corpus (English) and Chinese Penn Treebank (Chinese), F score of English chunking achieves to 95.14% and that of Chinese chunking is 95.23%. These results are state of the art with the best results that have been reported..
What problem does this paper attempt to address?