NLLB Team,Marta R. Costa-jussà,James Cross,Onur Çelebi,Maha Elbayad,Kenneth Heafield,Kevin Heffernan,Elahe Kalbassi,Janice Lam,Daniel Licht,Jean Maillard,Anna Sun,Skyler Wang,Guillaume Wenzek,Al Youngblood,Bapi Akula,Loic Barrault,Gabriel Mejia Gonzalez,Prangthip Hansanti,John Hoffman,Semarley Jarrett,Kaushik Ram Sadagopan,Dirk Rowe,Shannon Spruit,Chau Tran,Pierre Andrews,Necip Fazil Ayan,Shruti Bhosale,Sergey Edunov,Angela Fan,Cynthia Gao,Vedanuj Goswami,Francisco Guzmán,Philipp Koehn,Alexandre Mourachko,Christophe Ropers,Safiyyah Saleem,Holger Schwenk,Jeff Wang

Abstract:The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world 1 . Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture 2,3,4,5,6,7 , which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.

Multi-granularity Knowledge Sharing in Low-resource Neural Machine Translation

Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation

Multilingual Neural Machine Translation with Language Clustering

Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Multi-Source Cross-Lingual Model Transfer: Learning What to Share

Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation

A Study of Multilingual Neural Machine Translation

Zero-Resource Multilingual Model Transfer: Learning What to Share

Scaling neural machine translation to 200 languages

Multi-Granularity Optimization for Non-Autoregressive Translation.

Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation

Exploring Multi-Stage Information Interactions for Multi-Source Neural Machine Translation

Low Resource Arabic Dialects Transformer Neural Machine Translation Improvement through Incremental Transfer of Shared Linguistic Features

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

Automatic translation of spoken English based on improved machine learning algorithm

Mining parallel sentences from internet with multi-view knowledge distillation for low-resource language pairs

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Towards better Chinese-centric neural machine translation for low-resource languages

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation