Word Segmentation of Micro Blogs with Bagging.

Zhenting Yu,Xin-Yu Dai,Si Shen,Shujian Huang,Jiajun Chen
DOI: https://doi.org/10.1007/978-3-319-25207-0_54
2015-01-01
Abstract:This paper describes the model we designed for the Chinese word segmentation Task of NLPCC 2015. We firstly apply a word-based perceptron algorithm to build the base segmenter. Then, we use a Bootstrap Aggregating model of bagging which improves the segmentation results consistently on the three tracks of closed, semi-open and open test. Considering the characteristics of Weibo text, we also perform rule-based adaptation before decoding. Finally, our model achieves F-score 95.12% on closed track, 95.3% on semi-open track and 96.09% on open track.
What problem does this paper attempt to address?