A Novel Deep Learning Method for Obtaining Bilingual Corpus from Multilingual Website

Shaolin Zhu,Xiao Li,Yating Yang,Lei Wang,Chenggang Mi
DOI: https://doi.org/10.1155/2019/7495436
IF: 1.43
2019-01-10
Mathematical Problems in Engineering
Abstract:Machine translation needs a large number of parallel sentence pairs to make sure of having a good translation performance. However, the lack of parallel corpus heavily limits machine translation for low-resources language pairs. We propose a novel method that combines the continuous word embeddings with deep learning to obtain parallel sentences. Since parallel sentences are very invaluable for low-resources language pair, we introduce cross-lingual semantic representation to induce bilingual signals. Our experiments show that we can achieve promising results under lacking external resources for low-resource languages. Finally, we construct a state-of-the-art machine translation system in low-resources language pair.
Computer Science
What problem does this paper attempt to address?