Abstract: Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data. The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs, e.g., <EN> for English and <DE> for German. Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem (non-target language words exist in the generated translation) and, therefore, difficult to apply the current multilingual translation model to a broad range of zero-shot language scenarios. To understand when and why the navigation capabilities of language IDs are weakened, we compare two extreme decoder input cases in the ZST directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively visualizing the contextual word representations (CWRs) of these cases with teacher forcing, we show that 1) the CWRs of different languages are effectively distributed in separate regions when the sentence and ID are matched (ON setting), and 2) if the sentence and ID are unmatched (OFF setting), the CWRs of different languages are chaotically distributed. Our analyses suggest that although they work well in ideal ON settings, language IDs become fragile and lose their navigation ability when faced with off-target tokens, which commonly exist during inference but are rare in training scenarios. In response, we employ unlikelihood tuning on the negative (OFF) samples to minimize their probability such that the language IDs can discriminate between the on- and off-target tokens during training. Experiments spanning 40 ZST directions show that our method reduces the off-target ratio by -48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3% tuning cost.

Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

The unreasonable effectiveness of few-shot learning for machine translation

Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models

How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data

The Missing Ingredient in Zero-Shot Neural Machine Translation

Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance

Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency Regularization

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Improving Zero-Shot Translation of Low-Resource Languages

Improving Zero-shot Translation with Language-Independent Constraints

A Study of Multilingual Neural Machine Translation

Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models

Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Handling Syntactic Divergence in Low-resource Machine Translation