NLLB Team,Marta R. Costa-jussà,James Cross,Onur Çelebi,Maha Elbayad,Kenneth Heafield,Kevin Heffernan,Elahe Kalbassi,Janice Lam,Daniel Licht,Jean Maillard,Anna Sun,Skyler Wang,Guillaume Wenzek,Al Youngblood,Bapi Akula,Loic Barrault,Gabriel Mejia Gonzalez,Prangthip Hansanti,John Hoffman,Semarley Jarrett,Kaushik Ram Sadagopan,Dirk Rowe,Shannon Spruit,Chau Tran,Pierre Andrews,Necip Fazil Ayan,Shruti Bhosale,Sergey Edunov,Angela Fan,Cynthia Gao,Vedanuj Goswami,Francisco Guzmán,Philipp Koehn,Alexandre Mourachko,Christophe Ropers,Safiyyah Saleem,Holger Schwenk,Jeff Wang

Abstract:The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world 1 . Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture 2,3,4,5,6,7 , which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.

Semi-Supervised Neural Machine Translation Via Marginal Distribution Estimation

Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization

Semi-Supervised Learning for Neural Machine Translation

Exploiting Monolingual Data at Scale for Neural Machine Translation.

Language Model-Driven Unsupervised Neural Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Towards Neural Machine Translation with Partially Aligned Corpora

Neural Machine Translation Advised by Statistical Machine Translation

Unpaired Multimodal Neural Machine Translation via Reinforcement Learning

Improving Multilingual Translation by Representation and Gradient Regularization

Joint Training for Neural Machine Translation Models with Monolingual Data

Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation.

Relevance-guided Neural Machine Translation

Reciprocal Supervised Learning Improves Neural Machine Translation

An Investigation On Statistical Machine Translation With Neural Language Models

An Empirical study of Unsupervised Neural Machine Translation: analyzing NMT output, model's behavior and sentences' contribution

Scaling neural machine translation to 200 languages

POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

A Study of Multilingual Neural Machine Translation

Semi-supervised Neural Machine Translation with Consistency Regularization for Low-Resource Languages