Key Research of Pre-Processing on Mongolian-Chinese Neural Machine Translation

Jian Du,Hongxu Hou,Jing Wu,Zhipeng Shen,Jinting Li,Hongbin Wang
DOI: https://doi.org/10.2991/aiie-16.2016.1
2016-01-01
Abstract:Neural machine translation has recently achieved promising results with the big scale corpus. But there is little research on the small scale corpus, such as Mongolian. Mongolian belongs to the agglutinative language while Chinese is a pictograph. It is necessary to do some pre-processing for both Mongolian and Chinese before training the machine translation. In this paper, we successfully build an attention-based neural machine translation to do the CWMT2009 Mongolian to Chinese translation task. We also use four different approaches, respectively, to do the pre-processing for both Mongolian and Chinese, including segmenting Chinese into character, separating the Mongolian stem from the suffixes, addressing the case suffix and converting Mongolian into Latin. We carry out a lot of experiments to evaluate the approaches. We achieve the best BLEU with the score of 29.56. It is 1.82 points in BLEU score higher than the baseline which is trained with the original Mongolian and the general word segmentation of Chinese.
What problem does this paper attempt to address?