NLLB Team,Marta R. Costa-jussà,James Cross,Onur Çelebi,Maha Elbayad,Kenneth Heafield,Kevin Heffernan,Elahe Kalbassi,Janice Lam,Daniel Licht,Jean Maillard,Anna Sun,Skyler Wang,Guillaume Wenzek,Al Youngblood,Bapi Akula,Loic Barrault,Gabriel Mejia Gonzalez,Prangthip Hansanti,John Hoffman,Semarley Jarrett,Kaushik Ram Sadagopan,Dirk Rowe,Shannon Spruit,Chau Tran,Pierre Andrews,Necip Fazil Ayan,Shruti Bhosale,Sergey Edunov,Angela Fan,Cynthia Gao,Vedanuj Goswami,Francisco Guzmán,Philipp Koehn,Alexandre Mourachko,Christophe Ropers,Safiyyah Saleem,Holger Schwenk,Jeff Wang

Abstract:The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world 1 . Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture 2,3,4,5,6,7 , which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Expanding FLORES+ Benchmark for more Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation

No Language Left Behind: Scaling Human-Centered Machine Translation

From LLM to NMT: Advancing Low-Resource Machine Translation with Claude

Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

Machine Translation Evaluation Benchmark for Wu Chinese: Workflow and Analysis

Difficulty-Aware Machine Translation Evaluation

Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM

On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?

Scaling neural machine translation to 200 languages

Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages

Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

OMGEval: an Open Multilingual Generative Evaluation Benchmark for Large Language Models

Beyond English-Centric Multilingual Machine Translation

Towards better Chinese-centric neural machine translation for low-resource languages

SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation