A Comparative Study Of The Effect Of Word Segmentation On Chinese Terminology Extraction

Luning Ji,Qin Lu,Wenjie Li,Yi-Rong Chen
2006-01-01
Abstract:Automatic term extraction is the first step towards automatic or semi-automatic update of existing domain knowledge base. Most of the researches applied word segmentation as a preprocessing step to Chinese term extraction. However, segmentation ambiguity is unavoidable, especially in identifying unknown words for Chinese. In this paper, we discuss the effect and limitations of segmentation to Chinese terminology extraction. Detailed study shows that propagated errors caused by word segmentation have great impact on the result of terminology extraction. Based on our analysis and experiments, it is proven that character-based terminology extraction yields much better result than that using segmentation as a preprocessing step.
What problem does this paper attempt to address?