Abstract:Understanding representation transfer in multilingual neural machine translation can reveal the representational issue causing the zero-shot translation deficiency. In this work, we introduce the identity pair, a sentence translated into itself, to address the lack of the base measure in multilingual investigations, as the identity pair represents the optimal state of representation among any language transfers. In our analysis, we demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state. Thus, the zero-shot translation deficiency arises because representations are entangled with other languages and are not transferred effectively to the target language. Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder. The experimental results on Europarl-15, TED-19, and OPUS-100 datasets show that our methods substantially enhance the performance of zero-shot translations by improving language transfer capacity, thereby providing practical evidence to support our conclusions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the poor performance of zero - shot translation in multilingual neural machine translation (MNMT). Specifically, the author explores the role of representation transfer in multilingual translation by introducing "identity pairs", that is, a sentence is translated into itself. The paper points out that when current MNMT models handle zero - shot translation, the representation of the source language fails to be effectively transferred to the representation space of the target language, but is entangled with the representations of other languages, resulting in unsatisfactory translation results. ### Main problems 1. **Insufficient zero - shot translation performance**: - Zero - shot translation refers to translation when some language pairs have not been seen during the training process. Existing MNMT models perform poorly when handling zero - shot translation, mainly because the representation of the source language fails to be effectively transferred to the representation space of the target language. 2. **Effectiveness of representation transfer**: - The author finds that when the encoder processes translation tasks, it will transfer the representation of the source language to the subspace of the target language, rather than a language - independent state. This representation entanglement in the transfer process leads to a decline in zero - shot translation performance. ### Solutions To improve the performance of zero - shot translation, the author proposes two methods: 1. **Low - Rank Language - specific Embedding (LOLE)**: - Apply LOLE on the encoder side. By introducing a learnable embedding vector, make the representation more biased towards the subspace of the target language. This helps to improve the transfer ability of the representation, thereby improving the effect of zero - shot translation. 2. **Language - specific Contrastive Learning of Representations (LCLR)**: - Apply LCLR on the decoder side. Through contrastive learning, isolate the representation spaces of different languages. This helps to reduce representation entanglement and further improve translation performance. ### Experimental results The author conducted experiments on three benchmark datasets, namely Europarl - 15, TED - 19 and OPUS - 100. The experimental results show that the proposed LOLE and LCLR methods significantly improve the performance of zero - shot translation, especially in improving the representation transfer ability. ### Conclusion By introducing identity pairs and systematically analyzing representation transfer, the author proves that the main task of the encoder when handling multilingual translation is to transfer the representation of the source language to the subspace of the target language. The reason for the poor performance of zero - shot translation lies in representation entanglement, and the proposed LOLE and LCLR methods can effectively alleviate this problem, thereby improving the performance of zero - shot translation.

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

The Missing Ingredient in Zero-Shot Neural Machine Translation

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Improving Zero-shot Translation with Language-Independent Constraints

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages

Improving Zero-Shot Translation of Low-Resource Languages

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Zero-shot Cross-lingual Transfer is Under-specified Optimization

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models

On Learning Language-Invariant Representations for Universal Machine Translation

How Do Multilingual Encoders Learn Cross-lingual Representation?

Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

Language-Independent Representations Improve Zero-Shot Summarization

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Gender Lost In Translation: How Bridging The Gap Between Languages Affects Gender Bias in Zero-Shot Multilingual Translation

DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer

Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation.