Abstract:Several methodologies have recently been proposed to enhance the performance of low-resource Neural Machine Translation (NMT). However, these techniques have yet to be explored thoroughly in low-resource Thai and Myanmar languages. Therefore, we first applied augmentation techniques such as SwitchOut and Ciphertext Based Data Augmentation (CipherDAug) to improve NMT performance in these languages. We secondly enhanced the NMT performance by fine-tuning the pre-trained Multilingual Denoising BART model (mBART), where BART denotes Bidirectional and Auto-Regressive Transformer. We implemented three NMT systems: namely, Transformer+SwitchOut, Multi-source Transformer+CipherDAug, and fine-tuned mBART in the bidirectional translations of Thai-English-Myanmar language pairs from the ASEAN-MT corpus. Experimental results showed that Multi-source Transformer+CipherDAug significantly improved BLEU, ChrF, and TER scores over the first baseline Transformer and second baseline Edit-Based Transformer (EDITOR). The model achieved notable BLEU scores: 37.9 (English-to-Thai), 42.7 (Thai-to-English), 28.9 (English-to-Myanmar), 31.2 (Myanmar-to-English), 25.3 (Thai-to-Myanmar), and 25.5 (Myanmar-to-Thai). The fine-tuned mBART model also considerably outperformed the two baselines, except for the Myanmar-to-English pair. SwitchOut improved over the second baseline in all pairs and performed similarly to the first baseline in most cases. Lastly, we performed detailed analyses verifying that the CipherDAug and mBART models potentially facilitate improving low-resource NMT performance in Thai and Myanmar languages.

Boosting English-Amharic machine translation using corpus augmentation and Transformer

Low Resource Arabic Dialects Transformer Neural Machine Translation Improvement through Incremental Transfer of Shared Linguistic Features

Extended Parallel Corpus for Amharic-English Machine Translation

Enhancing Neural Machine Translation of Low-Resource Languages: Corpus Development, Human Evaluation and Explainable AI Architectures

Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation

A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Context based machine translation with recurrent neural network for English–Amharic translation

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

Effective General-Domain Data Inclusion for the Machine Translation Task by Vanilla Transformers

Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya

Enhanced Transformer Architecture for Natural Language Processing

Improving Data Augmentation for Low-Resource NMT Guided by POS-Tagging and Paraphrase Embedding

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languages

Hindi to English: Transformer-Based Neural Machine Translation

Improving Machine Translation with Phrase Pair Injection and Corpus Filtering

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Back Translation Survey for Improving Text Augmentation

Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition