Abstract:Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal language competition, especially when the shared vocabulary and model parameters are restricted in their size. However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT. In this work, we study zero-shot translation using language-specific encoders-decoders. We propose to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua. By selectively sharing parameters and applying cross-attentions, we explore maximizing the representation universality and realizing the best alignment of language-agnostic information. We also introduce a denoising auto-encoding (DAE) objective to jointly train the model with the translation task in a multi-task manner. Experiments on two public multilingual parallel datasets show that our proposed model achieves a competitive or better results than universal NMT and strong pivot baseline. Moreover, we experiment incrementally adding new language to the trained model by only updating the new model parameters. With this little effort, the zero-shot translation between this newly added language and existing languages achieves a comparable result with the model trained jointly from scratch on all languages.

Zero-shot Cross-lingual Transfer is Under-specified Optimization

Improving Zero-Shot Translation of Low-Resource Languages

A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning.

Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

Cross-lingual Transfer of Monolingual Models

Improving Zero-shot Translation with Language-Independent Constraints

Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency Regularization

One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance

Transductive Unbiased Embedding for Zero-Shot Learning

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Generalization Measures for Zero-Shot Cross-Lingual Transfer

The Missing Ingredient in Zero-Shot Neural Machine Translation

Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models

Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation