Adapting Large Language Models for Document-Level Machine Translation

Minghao Wu,Thuy-Trang Vu,Lizhen Qu,George Foster,Gholamreza Haffari
2024-10-11
Abstract:Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs. We first investigate the impact of prompt strategies on translation performance and then conduct extensive experiments using two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our results show that specialized models can sometimes surpass GPT-4 in translation performance but still face issues like off-target translation due to error propagation in decoding. We provide an in-depth analysis of these LLMs tailored for DocMT, examining translation errors, discourse phenomena, strategies for training and inference, the data efficiency of parallel documents, recent test set evaluations, and zero-shot crosslingual transfer. Our findings highlight the strengths and limitations of LLM-based DocMT models and provide a foundation for future research.
Computation and Language
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address several key issues in Document-Level Machine Translation (DOCMT), particularly how to leverage Large Language Models (LLMs) to improve translation performance for specific language pairs. Specifically, the research focuses on the following aspects: 1. **Adaptability Research**: Exploring how different fine-tuning methods (such as Parameter-Efficient Fine-Tuning, PEFT, and Full Fine-Tuning, FFT) and various prompting strategies can enable medium-sized LLMs to excel in document-level translation tasks. 2. **Performance Evaluation**: Conducting extensive experiments to evaluate the performance of different LLM backbone models (such as LLAMA 2-7B, BLOOM-7B, and VICUNA-7B) across 18 translation tasks involving 9 language pairs. 3. **Error Analysis**: Conducting an in-depth analysis of the types of errors made by LLMs in document-level translation, particularly the issue of "off-target translation," which frequently occurs due to error propagation during the decoding process. 4. **Cross-Language Zero-Shot Transfer**: Investigating the zero-shot cross-language transfer capabilities of LLMs on unseen language pairs to enhance their effectiveness and understanding in document-level translation tasks. 5. **Data Efficiency**: Exploring the data efficiency of parallel documents, i.e., the effectiveness of fine-tuning on limited datasets and the data requirements of different fine-tuning strategies. ### Main Findings 1. **Selective Excellence**: The study found that fine-tuned medium-sized LLMs can outperform GPT-4-TURBO in certain translation tasks, but still face off-target translation issues in other tasks, mainly due to error propagation during the decoding process. 2. **Fine-Tuning Strategies**: The PEFT method generally outperforms the FFT method, but the FFT method shows better data efficiency, requiring only about 1% of the total dataset to achieve performance comparable to models trained on the full dataset. 3. **Latest Test Set Evaluation**: When evaluated on the WMT2023 test set, LLM-based DOCMT models demonstrated better generalization capabilities on out-of-domain texts compared to traditional DOCMT models. 4. **Advantages of Base LLMs**: The research shows that base LLMs perform better in task-specific supervised fine-tuning compared to instruction-tuned LLMs and are more effective in zero-shot cross-language transfer. ### Conclusion This research demonstrates the potential and limitations of LLMs in document-level machine translation through extensive experiments and provides a crucial foundation for future research. The study emphasizes the importance of prompting strategies, fine-tuning methods, and data efficiency in improving the translation performance of LLMs.