Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora

Hua Wu,Haifeng Wang,Chengqing Zong
DOI: https://doi.org/10.3115/1599081.1599206
2008-01-01
Abstract:Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text. In this paper, we propose a method to perform domain adaptation for statistical machine translation, where in-domain bilingual corpora do not exist. This method first uses out-of-domain corpora to train a baseline system and then uses in-domain translation dictionaries and in-domain monolingual corpora to improve the in-domain performance. We propose an algorithm to combine these different resources in a unified framework. Experimental results indicate that our method achieves absolute improvements of 8.16 and 3.36 BLEU scores on Chinese to English translation and English to French translation respectively, as compared with the baselines using only out-of-domain corpora.
What problem does this paper attempt to address?