A New Error-driven Learning Approach for Chinese Word Segmentation

XIA Xin-Song,XIAO Jian-Guo
2006-01-01
Computer Science
Abstract:A well known problem for Chinese word segmentation(CWS)is that we can not have a unique definition of words.Different standards may result in different word segmentation outputs.It is unrealizable to develop different CWS systems according to different applications or standards,so it is significantly important to flexibly adapt segmen- tation outputs towards different standards or applications using existing CWS system.The paper presents a linguistical- ly enriched transformation-based learning approach for performing CWS adaptation as a postprocessor.Different from other transform-based learning used in CWS,the approach utilizes some linguistics information,and introduces word class and word internal structure to rule templates and transformations.The performance of the approach is evaluated on four different test sets,which represent four different standards.It turns out to be comparable to several state-of- the-art approaches which perform Chinese word segmentation based on single standard.
What problem does this paper attempt to address?