Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

Linda Zeng
2024-08-24
Abstract:Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in neural machine translation (NMT) of low - resource languages, due to the lack of large - scale data corpora, the translation quality is low. Specifically, low - resource languages refer to those languages for which there is not a large amount of digital data available for machine - learning algorithms to use as references, such as many Native American languages. This lack of data hinders the model from learning the syntactic and lexical patterns required for translation, resulting in inaccurate translations. The paper proposes to use generative adversarial networks (GAN) to augment the data of low - resource languages in order to improve the machine - translation quality of these languages. By training the model in a simulated low - resource environment with only 20,000 sentences, the paper demonstrates the potential of GAN in data augmentation and its ability to generate monolingual language data such as "ask me that healthy lunch im cooking up" and "my grandfather work harder than your grandfather before". This new data - augmentation method applies GAN to low - resource NMT for the first time, and the research results show that there is potential for future expansion of the application of GAN in low - resource NMT.