Adapting Chinese Word Segmentation for Translation by Using a Bi- lingual Dictionary

Hailong Cao,Masao Utiyama,Eiichiro Sumita
2009-01-01
Abstract:This paper proposes a method to adapt Chinese word segmentation for statistical machine translation. Two kinds of information are used to segment the Chinese sentences in the Chinese-English bilingual corpus which is the training set of the machine translation model. One is the manually segmented monolingual corpus which is widely used by general purpose segmenters. The other is the information hidden in the corresponding English sentences. In order to use the English information, rather than performing word alignment which is time consuming, we exploit a bilingual dictionary in a dynamic way. We demonstrate the usefulness of our approach on a Chinese to English translation task in a small and a large data environment.
What problem does this paper attempt to address?