Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision
Xiayang Shi,Ping Yue,Xinyi Liu,Chun Xu,Lin Xu
DOI: https://doi.org/10.1155/2022/5296946
IF: 3.12
2022-08-05
Computational Intelligence and Neuroscience
Abstract:Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.
mathematical & computational biology,neurosciences