Unsupervised Learning helps Supervised Neural Word Segmentation

Xiaobin Wang,Deng Cai,Guangwei Xu,Hai Zhao,Linlin Li,Luo Si
DOI: https://doi.org/10.1609/aaai.v33i01.33017200
2019-01-01
Abstract:By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this work makes the first attempt at exploring adding unsupervised segmentation information into neural supervised segmenter. We survey various effective strategies, including extending the character embedding, augmenting the word score and applying multi-task learning, for leveraging unsupervised information derived from abundant unlabeled data. Experiments on standard data sets show that the explored strategies indeed improve the recall rate of out-of-vocabulary words and thus boost the segmentation accuracy. Moreover, the model enhanced by the proposed methods outperforms state-of-the-art models in closed test and shows promising improvement trend when adopting three different strategies with the help of a large unlabeled data set. Our thorough empirical study eventually verifies the proposed approach outperforms the widely-used pre-training approach in terms of effectively making use of freely abundant unlabeled data.
What problem does this paper attempt to address?