Abstract:The training paradigm for machine translation has gradually shifted, from learning neural machine translation (NMT) models with extensive parallel corpora to instruction finetuning on multilingual large language models (LLMs) with high-quality translation pairs. In this paper, we focus on boosting many-to-many multilingual translation of LLMs with an emphasis on zero-shot translation directions. We demonstrate that prompt strategies adopted during finetuning are crucial to zero-shot translation and introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages and improve zero-shot translation performance. XConST is not a new method, but a version of CrossConST (Gao et al., 2023a) adapted for translation instruction finetuning with LLMs. Experimental results on ALMA (Xu et al., 2023), Tower (Team, 2024), and LLaMA-2 (Touvron et al., 2023) show that our approach consistently improves translation performance. Our implementations are available at

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to improve the performance of large-scale language models (LLMs) in many-to-many multilingual machine translation, with a particular focus on zero-shot translation directions. Specifically, the authors investigate the impact of different prompting strategies used during fine-tuning on zero-shot translation performance and propose a cross-lingual consistency regularization method (XConST) to reduce the representation gap between different languages, thereby enhancing zero-shot translation performance. ### Main Contributions 1. **Importance of Prompting Strategies**: - The authors demonstrate that different prompting strategies have a significant impact on zero-shot translation performance, and no single strategy performs best in all scenarios. 2. **Cross-Lingual Consistency Regularization (XConST)**: - A simple yet effective training strategy, XConST, is proposed to improve zero-shot translation performance by explicitly constraining semantically equivalent sentence pairs. XConST is an adapted version of CrossConST specifically for translation instruction fine-tuning. 3. **Experimental Results**: - Experimental results on the ALMA, Tower, and LLaMA-2 models show that XConST can significantly improve many-to-many multilingual translation performance, especially in zero-shot translation directions. ### Method Overview 1. **Multilingual Fine-Tuning**: - The authors use high-quality parallel datasets as translation instructions to fine-tune pre-trained LLMs. They employ five different prompting strategies and evaluate their performance in both supervised and zero-shot translation directions. 2. **Cross-Lingual Consistency Regularization**: - XConST introduces Kullback-Leibler (KL) regularization to constrain the representations of semantically equivalent sentence pairs, thereby reducing the representation gap between different languages. The training objective is defined as: \[ L_{\text{XConST}} (\theta) = L_{\text{ce}}^{\text{llm}}(\theta) + \alpha L_{\text{kl}}^{\text{llm}}(\theta) \] where, \[ L_{\text{ce}}^{\text{llm}}(\theta) = \ell(f(x, y, I; \theta), \hat{y}) \] \[ L_{\text{kl}}^{\text{llm}}(\theta) = \text{KL}(f(x, y, I; \theta) \| f(y, y, I; \theta)) \] 3. **Experimental Setup**: - The authors conducted experiments using multiple datasets, including WMT17 to WMT20, FLORES-200, etc. They performed full-weight and low-rank adaptation (LoRA) fine-tuning on the ALMA-7B-Pretrain and ALMA-13B-Pretrain models and evaluated the models' performance in both supervised and zero-shot translation directions. ### Experimental Results - **Zero-Shot Translation Performance**: - XConST significantly improved zero-shot translation performance, with average COMET scores increasing by 3.46 and 10.8 on the 7B and 13B models, respectively. - **Comparison with SOTA Models**: - The authors' model significantly outperformed ALMA-13B-LoRA in zero-shot translation directions and showed excellent performance on the FLORES-200 benchmark, being competitive with NLLB-54.5B. ### Conclusion By investigating different prompting strategies and proposing the XConST method, this paper successfully enhances many-to-many multilingual machine translation performance, particularly in zero-shot translation directions. These findings provide important references for future research.

Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models

Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency Regularization

A Novel Paradigm Boosting Translation Capabilities of Large Language Models

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Translate-and-Revise: Boosting Large Language Models for Constrained Translation

How Multilingual Are Large Language Models Fine-Tuned for Translation?

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection

Extrapolating Large Language Models to Non-English by Aligning Languages

Unifying the Convergences in Multilingual Neural Machine Translation

Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model

Tuning Large language model for End-to-end Speech Translation

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

Lens: Rethinking Multilingual Enhancement for Large Language Models

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation