The Missing Ingredient in Zero-Shot Neural Machine Translation

Naveen Arivazhagan,Ankur Bapna,Orhan Firat,Roee Aharoni,Melvin Johnson,Wolfgang Macherey

DOI: https://doi.org/10.48550/arXiv.1903.07091

2019-03-17

Abstract:Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the poor performance of zero - shot translation in multilingual neural machine translation (NMT). Specifically, although existing multilingual NMT models perform well in terms of translation quality in the supervised direction, their performance drops significantly when translating between unseen language pairs, usually lagging behind the method of two - step translation (i.e., "pivot translation" or "bridging") via an intermediate language (such as English) by 2 to 10 BLEU points. This indicates that the existing multilingual models have insufficient generalization ability between unseen language pairs. To solve this problem, the paper first diagnoses why the state - of - the - art multilingual NMT models relying on parameter sharing cannot generalize well to unseen language pairs. Then, the author proposes a method of imposing an auxiliary loss on the NMT encoder to promote representational invariance across languages. This method is simple and effective and can significantly improve the quality of zero - shot translation without degrading the performance in the supervised direction. Experimental results show that this method achieves zero - shot translation performance comparable to pivot translation for the first time on the WMT14 English - French - German dataset and can be easily extended to more languages.

The Missing Ingredient in Zero-Shot Neural Machine Translation

Improving Zero-shot Translation with Language-Independent Constraints

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Improving Zero-Shot Translation of Low-Resource Languages

Improved Zero-shot Neural Machine Translation Via Ignoring Spurious Correlations.

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance

Multilingual Neural Machine Translation for Zero-Resource Languages

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

A Study of Multilingual Neural Machine Translation

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation.

Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency Regularization

Improving Multilingual Translation by Representation and Gradient Regularization

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation

Effective Guidance in Zero-Shot Multilingual Translation Via Multiple Language Prototypes