Chinese Word Segmentation Method for Domain-Special Machine Translation

SU Chen,ZHANG Yujie,GUO Zhen
2013-01-01
Abstract:In developing a domain-specific Chinese-English machine translation system,the accuracy of Chinese word segmentation in large-scale training corpus often decreases because of unknown words.The lack of domain-specific annotated corpus makes supervised learning approaches unable to adapt.This problem results in many errors in translation knowledge extraction and therefore seriously affects translation quality.To resolve the domain adaptation problem,we implemented Chinese word segmentation by exploiting n-gram statistical features in raw corpus and bilingually motivated word segmentation information in parallel corpus,respectively.We further propose a latticebased method to combine multiple results and use dynamic programming algorithm to get the best word segmentation result.For evaluation,we conducted experiments of Chinese word segmentation and Chinese-English machine translation using the data of NTCIR-10Chinese-English patent task.The experimental results show that the proposed method brought about improvements both in F-measure of the Chinese word segmentation and in BLEU score of the Chinese-English statistical machine translation system.
What problem does this paper attempt to address?