Abstract:Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal language competition, especially when the shared vocabulary and model parameters are restricted in their size. However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT. In this work, we study zero-shot translation using language-specific encoders-decoders. We propose to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua. By selectively sharing parameters and applying cross-attentions, we explore maximizing the representation universality and realizing the best alignment of language-agnostic information. We also introduce a denoising auto-encoding (DAE) objective to jointly train the model with the translation task in a multi-task manner. Experiments on two public multilingual parallel datasets show that our proposed model achieves a competitive or better results than universal NMT and strong pivot baseline. Moreover, we experiment incrementally adding new language to the trained model by only updating the new model parameters. With this little effort, the zero-shot translation between this newly added language and existing languages achieves a comparable result with the model trained jointly from scratch on all languages.

Incorporating Language-specific Adapter into Multilingual Neural Machine Translation

Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Adaptive Adapters: an Efficient Way to Incorporate BERT into Neural Machine Translation

Language-aware Interlingua for Multilingual Neural Machine Translation.

Pluggable Neural Machine Translation Models Via Memory-augmented Adapters

Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter

Adaptive Token-level Cross-lingual Feature Mixing for Multilingual Neural Machine Translation

Learn and Consolidate: Continual Adaptation for Zero-Shot and Multilingual Neural Machine Translation.

Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

Learning Domain Specific Sub-layer Latent Variable for Multi-Domain Adaptation Neural Machine Translation

Continual Learning for Multilingual Neural Machine Translation Via Dual Importance-based Model Division

Lightweight Adapter Tuning for Multilingual Speech Translation

Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

Adaptation of Language Models for SMT Using Neural Networks with Topic Information.

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation

One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

A General Framework for Adaptation of Neural Machine Translation to Simultaneous Translation