Abstract:Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs' performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: <a class="link-external link-https" href="https://github.com/NJUNLP/MMT-LLM" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly include two aspects: 1. **Performance of large - scale multilingual machine translation**: The paper aims to evaluate how large language models (LLMs) perform in multilingual machine translation (MMT) tasks involving a large number of languages. Specifically, researchers hope to understand whether these models can effectively translate between multiple languages, especially for languages with fewer resources. 2. **Factors affecting the translation performance of LLMs**: In addition to evaluating performance, researchers also hope to experimentally analyze which factors will affect the performance of LLMs in multilingual machine translation. This includes, but is not limited to, the size of the pre - training corpus, the design of context templates, and the selection of context examples. To answer these questions, researchers have carried out the following work: - **Evaluating multiple popular large - scale language models**: Researchers selected eight popular LLMs, including ChatGPT and GPT - 4, and systematically evaluated their performance in 102 languages and 606 translation directions. - **Comparing with supervised baseline models**: Researchers compared the performance of LLMs with three powerful supervised baseline models (M2M - 100, NLLB, and Google Translate), revealing the gaps between different translation paradigms. - **In - depth analysis of factors affecting translation performance**: Through experiments, researchers have discovered some new working patterns. For example, LLMs can acquire translation capabilities in the case of limited resources and can even generate translations of medium quality on zero - resource languages. In addition, cross - language examples can provide better task guidance for the translation of low - resource languages. Through these studies, the paper not only shows the potential of LLMs in the field of multilingual machine translation but also points out the current challenges, especially that the performance on low - resource languages still needs to be improved.

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Document-Level Machine Translation with Large Language Models

A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models

BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

Adapting Large Language Models for Document-Level Machine Translation

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

Human-in-the-loop Machine Translation with Large Language Model

What do Large Language Models Need for Machine Translation Evaluation?

Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models

Exploring Human-Like Translation Strategy with Large Language Models

Multilingual Large Language Models and Curse of Multilinguality

Multilingual Large Language Models: A Systematic Survey

Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation

How do Large Language Models Handle Multilingualism?

A Novel Paradigm Boosting Translation Capabilities of Large Language Models

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

PolyLM: An Open Source Polyglot Large Language Model

Adaptive Machine Translation with Large Language Models

Lego-MT: Towards Detachable Models in Massively Multilingual Machine Translation