A new double attention decoding model based on cascade RCNN and word embedding fusion for Chinese-English multimodal translation

Haiying Liu
DOI: https://doi.org/10.1504/ijris.2024.137429
2024-01-01
International Journal of Reasoning-based Intelligent Systems
Abstract:Traditional multimodal machine translation (MMT) is to optimise the translation process from the source language to the target language with the help of important feature information in images. However, the information in the image does not necessarily appear in the text, which will interfere with the translation. Compared with the reference translation, mistranslation can be appeared in the translation results. In order to solve above problems, we propose a double attention decoding method based on cascade RCNN to optimise existing multimodal neural machine translation models. The cascade RCNN is applied to source language and source image respectively. Word embedding is used to fuse the initialisation and the semantic information of the dual encoder. In attention computation process, it can reduce the focus on the repetitive information in the past. Finally, experiments are carried out on Chinese-English test sets to verify the effectiveness of the proposed method. Compared with other state-of-the-art methods, the proposed method can obtain better translation results.
What problem does this paper attempt to address?