Addressing Domain Adaptation for Chinese Word Segmentation with Instances-Based Transfer Learning.

Yanna Zhang,Jinan Xu,Guoyi Miao,Yufeng Chen,Yujie Zhang
DOI: https://doi.org/10.1007/978-3-030-01716-3_3
2018-01-01
Abstract:Recent studies have shown effectiveness in using neural networks for Chinese Word Segmentation (CWS). However, these models, constrained by the domain and size of the training corpus, do not work well in domain adaptation. In this paper, we propose a novel instance-transferring method, which use valuable target domain annotated instances to improve CWS on different domains. Specifically, we introduce semantic similarity computation based on character-based n-gram embedding to select instances. Furthermore, training sentences similar to instances are used to help annotate instances. Experimental results show that our method can effectively boost cross-domain segmentation performance. We achieve state-of-the-art results on Internet literatures datasets, and competitive results to the best reported on micro-blog datasets.
What problem does this paper attempt to address?