Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

Linda Zeng

2024-08-24

Abstract:Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in neural machine translation (NMT) of low - resource languages, due to the lack of large - scale data corpora, the translation quality is low. Specifically, low - resource languages refer to those languages for which there is not a large amount of digital data available for machine - learning algorithms to use as references, such as many Native American languages. This lack of data hinders the model from learning the syntactic and lexical patterns required for translation, resulting in inaccurate translations. The paper proposes to use generative adversarial networks (GAN) to augment the data of low - resource languages in order to improve the machine - translation quality of these languages. By training the model in a simulated low - resource environment with only 20,000 sentences, the paper demonstrates the potential of GAN in data augmentation and its ability to generate monolingual language data such as "ask me that healthy lunch im cooking up" and "my grandfather work harder than your grandfather before". This new data - augmentation method applies GAN to low - resource NMT for the first time, and the research results show that there is potential for future expansion of the application of GAN in low - resource NMT.

Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Data Augmentation for Low‐resource Languages NMT Guided by Constrained Sampling

Handling Syntactic Divergence in Low-resource Machine Translation

Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning

Improving Adversarial Neural Machine Translation for Morphologically Rich Language

A Scenario-Generic Neural Machine Translation Data Augmentation Method

An Efficient Method for Generating Synthetic Data for Low-Resource Machine Translation

Syntax-Aware Data Augmentation for Neural Machine Translation

Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation

Machine Translation in Low-Resource Languages by an Adversarial Neural Network

AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation

Improvement in Machine Translation with Generative Adversarial Networks

Improving Data Augmentation for Low-Resource NMT Guided by POS-Tagging and Paraphrase Embedding

Generative Multimodal Data Augmentation for Low-Resource Multimodal Named Entity Recognition

Generating Adversarial Examples for Low-Resource NMT Via Multi-Reward Reinforcement Learning.

Unsupervised Image-to-Image Translation with Generative Adversarial Networks.

Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation

Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation

Random Concatenation: A Simple Data Augmentation Method for Neural Machine Translation

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation