Domain Adaptation for CRF-based Chinese Word Segmentation using Free Annotations.

Yijia Liu,Yue Zhang,Wanxiang Che,Ting Liu,Fan Wu
DOI: https://doi.org/10.3115/v1/d14-1093
2014-01-01
Abstract:Supervised methods have been the dominant approach for Chinese word segmentation. The performance can drop significantly when the test domain is different from the training domain. In this paper, we study the problem of obtaining partial annotation from freely available data to help Chinese word segmentation on different domains. Different sources of free annotations are transformed into a unified form of partial annotation and a variant CRF model is used to leverage both fully and partially annotated data consistently. Experimental results show that the Chinese word segmentation model benefits from free partially annotated data. On the SIGHAN Bakeoff 2010 data, we achieve results that are competitive to the best reported in the literature.
What problem does this paper attempt to address?