New Word Detection Algorithm for Chinese Based on Extraction of Local Context Information

Hua-Lin Zeng,Chang-Le Zhou,Xiao-Dong Shi,Tang-Qiu Li,Chang Su
DOI: https://doi.org/10.1109/iske.2008.4731038
2010-01-01
Abstract:Chinese segmentation is an important issue in Chinese text processing. The traditional segmentation methods those depend on an existing dictionary suffer the drawbacks when encounter unknown words. The paper proposed a segmenting algorithm for Chinese based on extracting local context information. It added the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focusing on the process of online segmentation and new word detection achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.
What problem does this paper attempt to address?