The Missing Ingredient in Zero-Shot Neural Machine Translation

Naveen Arivazhagan,Ankur Bapna,Orhan Firat,Roee Aharoni,Melvin Johnson,Wolfgang Macherey
DOI: https://doi.org/10.48550/arXiv.1903.07091
2019-03-17
Abstract:Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the poor performance of zero - shot translation in multilingual neural machine translation (NMT). Specifically, although existing multilingual NMT models perform well in terms of translation quality in the supervised direction, their performance drops significantly when translating between unseen language pairs, usually lagging behind the method of two - step translation (i.e., "pivot translation" or "bridging") via an intermediate language (such as English) by 2 to 10 BLEU points. This indicates that the existing multilingual models have insufficient generalization ability between unseen language pairs. To solve this problem, the paper first diagnoses why the state - of - the - art multilingual NMT models relying on parameter sharing cannot generalize well to unseen language pairs. Then, the author proposes a method of imposing an auxiliary loss on the NMT encoder to promote representational invariance across languages. This method is simple and effective and can significantly improve the quality of zero - shot translation without degrading the performance in the supervised direction. Experimental results show that this method achieves zero - shot translation performance comparable to pivot translation for the first time on the WMT14 English - French - German dataset and can be easily extended to more languages.