Abstract:Domain-specific Multi-modal Neural Machine Translation (DMNMT) aims to translate domain-specific sentences from a source language to a target language by incorporating text-related visual information. Generally, domain-specific text-image data often complement each other and have the potential to collaboratively enhance the representation of domain-specific information. Unfortunately, there is a considerable modality gap between image and text in data format and semantic expression, which leads to distinctive challenges in domain-text translation tasks. Narrowing the modality gap and improving domain-aware representation are two critical challenges in DMNMT. To this end, this paper proposes a progressive modality-complement aggregative MultiTransformer, which aims to simultaneously narrow the modality gap and capture domain-specific multi-modal representation. We first adopt a bidirectional progressive cross-modal interactive strategy to effectively narrow the text-to-text, text-to-visual, and visual-to-text semantics in the multi-modal representation space by integrating visual and text information layer-by-layer. Subsequently, we introduce a modality-complement MultiTransformer based on progressive cross-modal interaction to extract the domain-related multi-modal representation, thereby enhancing machine translation performance. Experiment results on the Fashion-MMT and Multi-30k datasets are conducted, and the results show that the proposed approach outperforms the compared state-of-the-art (SOTA) methods on the En-Zh task in E-commerce domain, En-De, En-Fr and En-Cs tasks of Multi-30k in general domain. The in-depth analysis confirms the validity of the proposed modality-complement MultiTransformer and bidirectional progressive cross-modal interactive strategy for DMNMT.

MO-Transformer: Extract High-level Relationship Between Words for Neural Machine Translation

To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph

Graph-to-Sequence Neural Machine Translation

Cross Aggregation of Multi-head Attention for Neural Machine Translation.

Capsule-Transformer for Neural Machine Translation

SG-Net: Syntax Guided Transformer for Language Representation

Boosting Neural Machine Translation with Dependency-Scaled Self-Attention Network.

Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network

Graph Transformer for Graph-to-Sequence Learning

Modeling Graph Structure in Transformer for Better AMR-to-Text Generation.

A neural machine translation method based on split graph convolutional self-attention encoding

Syntax-based Transformer for Neural Machine Translation

SG-Net: Syntax-Guided Machine Reading Comprehension.

Context-aware Positional Representation for Self-Attention Networks.

X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism

Explicitly Modeling Word Translations in Neural Machine Translation

Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation

Syntax-aware Transformer Encoder for Neural Machine Translation

AMR-To-Text Generation with Graph Transformer.

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

Multi-Hop Transformer for Document-Level Machine Translation