RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation

Yan Wang,Yawen Zeng,Junjie Liang,Xiaofen Xing,Jin Xu,Xiangmin Xu
DOI: https://doi.org/10.1145/3652583.3658018
2024-01-01
Abstract:As an extension of machine translation, the primary objective of multi-modal machine translation is to optimize the utilization of visual information. Technically, image information is integrated into multi-modal fusion and alignment as an auxiliary modality through concepts or latent semantics, which are typically based on the Transformer framework. However, current approaches often ignore one modality to design numerous handcrafted features (e.g. visual concept extraction) and require training of all parameters in their framework. Therefore, it is worthwhile to explore multi-modal concepts or features to enhance performance and an efficient approach to incorporate visual information with minimal cost. Meanwhile, with the development of multi-modal large language models (MLLMs), they are faced with the visual hallucination issue of compromising performance, despite their powerful capabilities. Inspired by pioneering techniques in the multi-modal field, such as prompt learning and MLLMs, this paper innovatively explores the possibility of applying multi-modal prompt learning to this multi-modal machine translation task. Our framework offers three key advantages: it establishes a robust connection between visual concepts and translation processes, requires a minimum of 1.46M parameters for training, and can be seamlessly integrated into any existing framework by retrieving a multi-modal dictionary. Specifically, we propose two prompt-guided strategies: a learnable prompt-refined module and a heuristic prompt-refined module. Among them, the learnable strategy utilizes off-the-shelf pre-trained models, while the heuristic strategy constrains the hallucination problem via concept retrieval. Our experiments on two real-world benchmark datasets demonstrate that our proposed method outperforms all competitors.
What problem does this paper attempt to address?