Uyghur-Chinese statistical machine translation by incorporating morphological information

Batuer Aisha,Maosong Sun
2010-01-01
Journal of Computational Information Systems
Abstract:This paper presents a method of machine translation from Uyghur, an agglutinative language with very productive inflectional and derivational morphology, to Chinese, by incorporating morphological information into a statistical machine translation model. The basic idea is the agglutinated suffixes should be treated carefully so as to make correct translation, because they play important roles in the Uyghur language. Experimental results showed that morphological decomposition of Uyghur source is beneficial, specially for smaller-size training corpora. The BLEU score is improved to 25.26 from 13.61 when the input data is tokenized compared to the case without tokenization. © 2010 Binary Information Press October, 2010.
What problem does this paper attempt to address?