NLLB Team,Marta R. Costa-jussà,James Cross,Onur Çelebi,Maha Elbayad,Kenneth Heafield,Kevin Heffernan,Elahe Kalbassi,Janice Lam,Daniel Licht,Jean Maillard,Anna Sun,Skyler Wang,Guillaume Wenzek,Al Youngblood,Bapi Akula,Loic Barrault,Gabriel Mejia Gonzalez,Prangthip Hansanti,John Hoffman,Semarley Jarrett,Kaushik Ram Sadagopan,Dirk Rowe,Shannon Spruit,Chau Tran,Pierre Andrews,Necip Fazil Ayan,Shruti Bhosale,Sergey Edunov,Angela Fan,Cynthia Gao,Vedanuj Goswami,Francisco Guzmán,Philipp Koehn,Alexandre Mourachko,Christophe Ropers,Safiyyah Saleem,Holger Schwenk,Jeff Wang

Abstract:The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world 1 . Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture 2,3,4,5,6,7 , which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.

Understanding and Analyzing Model Robustness and Knowledge-Transfer in Multilingual Neural Machine Translation using TX-Ray

Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation

A Study of Multilingual Neural Machine Translation

Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

Linguistic Knowledge-Aware Neural Machine Translation

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Scaling neural machine translation to 200 languages

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

Translating with Bilingual Topic Knowledge for Neural Machine Translation.

Multi-granularity Knowledge Sharing in Low-resource Neural Machine Translation

An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models

Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?

Augmenting Neural Machine Translation with Knowledge Graphs

Exploiting Monolingual Data at Scale for Neural Machine Translation.

Assessing the Bilingual Knowledge Learned by Neural Machine Translation Models

Towards Robust k-Nearest-Neighbor Machine Translation

Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation

Complete Multilingual Neural Machine Translation

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Exploiting monolingual data at scale for neural machine translation