Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning.

Xipeng Qiu,Jiayi Zhao,Xuanjing Huang
DOI: https://doi.org/10.18653/v1/d13-1062
2013-01-01
Abstract:Chinese word segmentation and part-ofspeech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two representative heterogeneous corpora, Penn Chinese Treebank (CTB) and PKU’s People’s Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two “related” tasks and train our model on two heterogeneous corpora simultaneously. Experiments show that our method can boost the performances of both of the heterogeneous corpora by using the shared information, and achieves significant improvements over the state-of-the-art methods.
What problem does this paper attempt to address?