Abstract:The training paradigm for machine translation has gradually shifted, from learning neural machine translation (NMT) models with extensive parallel corpora to instruction finetuning on multilingual large language models (LLMs) with high-quality translation pairs. In this paper, we focus on boosting many-to-many multilingual translation of LLMs with an emphasis on zero-shot translation directions. We demonstrate that prompt strategies adopted during finetuning are crucial to zero-shot translation and introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages and improve zero-shot translation performance. XConST is not a new method, but a version of CrossConST (Gao et al., 2023a) adapted for translation instruction finetuning with LLMs. Experimental results on ALMA (Xu et al., 2023), Tower (Team, 2024), and LLaMA-2 (Touvron et al., 2023) show that our approach consistently improves translation performance. Our implementations are available at
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
This paper aims to improve the performance of large-scale language models (LLMs) in many-to-many multilingual machine translation, with a particular focus on zero-shot translation directions. Specifically, the authors investigate the impact of different prompting strategies used during fine-tuning on zero-shot translation performance and propose a cross-lingual consistency regularization method (XConST) to reduce the representation gap between different languages, thereby enhancing zero-shot translation performance.
### Main Contributions
1. **Importance of Prompting Strategies**:
- The authors demonstrate that different prompting strategies have a significant impact on zero-shot translation performance, and no single strategy performs best in all scenarios.
2. **Cross-Lingual Consistency Regularization (XConST)**:
- A simple yet effective training strategy, XConST, is proposed to improve zero-shot translation performance by explicitly constraining semantically equivalent sentence pairs. XConST is an adapted version of CrossConST specifically for translation instruction fine-tuning.
3. **Experimental Results**:
- Experimental results on the ALMA, Tower, and LLaMA-2 models show that XConST can significantly improve many-to-many multilingual translation performance, especially in zero-shot translation directions.
### Method Overview
1. **Multilingual Fine-Tuning**:
- The authors use high-quality parallel datasets as translation instructions to fine-tune pre-trained LLMs. They employ five different prompting strategies and evaluate their performance in both supervised and zero-shot translation directions.
2. **Cross-Lingual Consistency Regularization**:
- XConST introduces Kullback-Leibler (KL) regularization to constrain the representations of semantically equivalent sentence pairs, thereby reducing the representation gap between different languages. The training objective is defined as:
\[
L_{\text{XConST}} (\theta) = L_{\text{ce}}^{\text{llm}}(\theta) + \alpha L_{\text{kl}}^{\text{llm}}(\theta)
\]
where,
\[
L_{\text{ce}}^{\text{llm}}(\theta) = \ell(f(x, y, I; \theta), \hat{y})
\]
\[
L_{\text{kl}}^{\text{llm}}(\theta) = \text{KL}(f(x, y, I; \theta) \| f(y, y, I; \theta))
\]
3. **Experimental Setup**:
- The authors conducted experiments using multiple datasets, including WMT17 to WMT20, FLORES-200, etc. They performed full-weight and low-rank adaptation (LoRA) fine-tuning on the ALMA-7B-Pretrain and ALMA-13B-Pretrain models and evaluated the models' performance in both supervised and zero-shot translation directions.
### Experimental Results
- **Zero-Shot Translation Performance**:
- XConST significantly improved zero-shot translation performance, with average COMET scores increasing by 3.46 and 10.8 on the 7B and 13B models, respectively.
- **Comparison with SOTA Models**:
- The authors' model significantly outperformed ALMA-13B-LoRA in zero-shot translation directions and showed excellent performance on the FLORES-200 benchmark, being competitive with NLLB-54.5B.
### Conclusion
By investigating different prompting strategies and proposing the XConST method, this paper successfully enhances many-to-many multilingual machine translation performance, particularly in zero-shot translation directions. These findings provide important references for future research.