A Pragmatic Approach to Increase Accuracy of Chinese Word-Segmentation

Chen Wenyu,Chen Biao,Xiang Tao,Zhang Zhongquan
DOI: https://doi.org/10.1109/ifita.2010.262
2010-01-01
Abstract:Chinese word segmentation is important for understanding and dealing with Chinese natural language, and it is also a important part of search engineer, text retrieval, speech recognition, automatic translation. Chinese word segmentation is challenging because there is no space or physical means to mark the boundaries of words. It is often difficult to define what constitutes a word in Chinese. Currently, we have not yet fully mature and practical-oriented available Chinese word segmentation system, especially in the word-segmentation accuracy. This article presents a pragmatic approach to Chinese word segmentation to increase the word-segmentation accuracy. We introduce the combining mechanism of hybrid dictionary and universal dictionary, we design the practical data structure and describe this word segmentation algorithm, and give the test results.
What problem does this paper attempt to address?