Mongolian-Chinese Unsupervised Neural Machine Translation With Lexical Feature

Ziyu Wu,Hongxu Hou,Ziyue Guo,Xuejiao Wang,Shuo Sun
DOI: https://doi.org/10.1007/978-3-030-32381-3_27
2019-01-01
Abstract:Machine translation has achieved impressive performance with the advances in deep learning and rely on large scale parallel corpora. There have been a large number of attempts to extend these successes to low-resource language, yet requiring large parallel sentences. In this study, we build the Mongolian-Chinese neural machine translation model based on unsupervised methods. Cross-lingual word embedding training plays a crucial role in unsupervised machine translation which generative adversarial networks (GANs) training methods only perform well between two closely-related languages, yet the self-learning method can learn high-quality bilingual embedding mappings without any parallel corpora in low-source language. In this work, apply the self-learning method is better than using GANs to improve the BLEU score of 1.0. On this basis, we analyze the Mongolian word lexical features and use stem-affixes segmentation in Mongolian to replace the Bytes-Pair-Encoding (BPE) operation, so that the cross-lingual word embedding training is more accurate, and obtain higher quality bilingual words embedding to enhance translation performance. We reporting BLEU score of 15.2 on the CWMT2017 Mongolian-Chinese dataset, without using any parallel corpora during training.
What problem does this paper attempt to address?