Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Segmentation for Micro-blog Texts

Xipeng Qiu, Peng Qian, Zhan Shi
DOI: https://doi.org/10.1007/978-3-319-50496-4_84
2016-01-01
Abstract:In this paper, we give an overview for the shared task at the 5th CCF Conference on Natural Language Processing & Chinese Computing (NLPCC 2016): Chinese word segmentation for micro-blog texts. Different with the popular used newswire datasets, the dataset of this shared task consists of the relatively informal micro-texts. Besides, we also use a new psychometric-inspired evaluation metric for Chinese word segmentation, which addresses to balance the very skewed word distribution at different levels of difficulty. The data and evaluation codes can be downloaded from https://github.com/FudanNLP/NLPCC-WordSeg-Weibo.
What problem does this paper attempt to address?