Sentence Augmentation for Language Translation Using GPT-2

Ranto Sawai,Incheon Paik,Ayato Kuwana
DOI: https://doi.org/10.3390/electronics10243082
IF: 2.9
2021-12-10
Electronics
Abstract:Data augmentation has recently become an important method for improving performance in deep learning. It is also a significant issue in machine translation, and various innovations such as back-translation and noising have been made. In particular, current state-of-the-art model architectures such as BERT-fused or efficient data generation using the GPT model provide good inspiration to improve the translation performance. In this study, we propose the generation of additional data for neural machine translation (NMT) using a sentence generator by GPT-2 that produces similar characteristics to the original. BERT-fused architecture and back-translation are employed for the translation architecture. In our experiments, the model produced BLEU scores of 27.50 for tatoebaEn-Ja, 30.14 for WMT14En-De, and 24.12 for WMT18En-Ch.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?