Domain Adaptation for Medical Text Translation Using Web Resources

Yi Lu,Longyue Wang,Derek F. Wong,Lidia S. Chao,Yiming Wang
DOI: https://doi.org/10.3115/v1/w14-3328
2014-01-01
Abstract:This paper describes adapting statistical machine translation (SMT) systems to medical domain using in-domain and general-domain data as well as webcrawled in-domain resources. In order to complement the limited in-domain corpora, we apply domain focused webcrawling approaches to acquire indomain monolingual data and bilingual lexicon from the Internet. The collected data is used for adapting the language model and translation model to boost the overall translation quality. Besides, we propose an alternative filtering approach to clean the crawled data and to further optimize the domain-specific SMT system. We attend the medical summary sentence unconstrained translation task of the Ninth Workshop on Statistical Machine Translation (WMT2014). Our systems achieve the second best BLEU scores for Czech-English, fourth for French-English, English-French language pairs and the third best results for reminding pairs.
What problem does this paper attempt to address?