Unpaired Multimodal Neural Machine Translation via Reinforcement Learning

Yijun Wang,Tianxin Wei,Qi Liu,Enhong Chen
DOI: https://doi.org/10.1007/978-3-030-73197-7_11
2021-01-01
Abstract:AbstractEnd-to-end neural machine translation (NMT) heavily relies on parallel corpora for training. However, high-quality parallel corpora are usually costly to collect. To tackle this problem, multimodal content, especially image, has been introduced to help build an NMT system without parallel corpora. In this paper, we propose a reinforcement learning (RL) method to build an NMT system by introducing a sequence-level supervision signal as a reward. Based on the fact that visual information can be a universal representation to ground different languages, we design two different rewards to guide the learning process, i.e., (1) the likelihood of generated sentence given source image and (2) the distance of attention weights given by image caption models. Experimental results on the Multi30K, IAPR-TC12, and IKEA datasets show that the proposed learning mechanism achieves better performance than existing methods.
What problem does this paper attempt to address?