Abstract:Multilingual Neural Machine Translation (MNMT) models are commonly trained on a joint set of bilingual corpora which is acutely English-centric (i.e. English either as the source or target language). While direct data between two languages that are non-English is explicitly available at times, its use is not common. In this paper, we first take a step back and look at the commonly used bilingual corpora (WMT), and resurface the existence and importance of implicit structure that existed in it: multi-way alignment across examples (the same sentence in more than two languages). We set out to study the use of multi-way aligned examples to enrich the original English-centric parallel corpora. We reintroduce this direct parallel data from multi-way aligned corpora between all source and target languages. By doing so, the English-centric graph expands into a complete graph, every language pair being connected. We call MNMT with such connectivity pattern complete Multilingual Neural Machine Translation (cMNMT) and demonstrate its utility and efficacy with a series of experiments and analysis. In combination with a novel training data sampling strategy that is conditioned on the target language only, cMNMT yields competitive translation quality for all language pairs. We further study the size effect of multi-way aligned data, its transfer learning capabilities and how it eases adding a new language in MNMT. Finally, we stress test cMNMT at scale and demonstrate that we can train a cMNMT model with up to 111*112=12,432 language pairs that provides competitive translation quality for all language pairs.

Integrating Pre-trained Language Model into Neural Machine Translation

Encoder and Decoder, Not One Less for Pre-trained Language Model Sponsored NMT

Deep Fusing Pre-trained Models into Neural Machine Translation.

An Investigation On Statistical Machine Translation With Neural Language Models

Integrating Prior Translation Knowledge Into Neural Machine Translation

GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

Towards Making the Most of BERT in Neural Machine Translation

Language-aware Interlingua for Multilingual Neural Machine Translation.

Multiscale Collaborative Deep Models for Neural Machine Translation

Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Joint Space Neural Probabilistic Language Model for Statistical Machine Translation

A Study of Pre-trained Language Models in Natural Language Processing

How Does Pretraining Improve Discourse-Aware Translation?

Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work

Language Model-Driven Unsupervised Neural Machine Translation

Simple Fusion: Return of the Language Model

Improving Neural Machine Translation with Pre-trained Representation

Character-Aware Low-Resource Neural Machine Translation With Weight Sharing And Pre-Training

Complete Multilingual Neural Machine Translation

Incorporating Pre-trained Model into Neural Machine Translation