Abstract:Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of these factors. We find that LLMs display strong translation capability after being fine-tuned on as few as 32 parallel sentences and that fine-tuning on a single translation direction enables translation in multiple directions. However, the choice of direction is critical: fine-tuning LLMs with only English on the target side can lead to task misinterpretation, which hinders translation into non-English languages. Problems also arise when noisy synthetic data is placed on the target side, especially when the target language is well-represented in LLM pre-training. Yet interestingly, synthesized data in an under-represented language has a less pronounced effect. Our findings suggest that when adapting LLMs to translation, the requirement on data quantity can be eased but careful considerations are still crucial to prevent an LLM from exploiting unintended data biases.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper mainly explores whether these models can be effectively aligned by a small amount of and possibly noisy data when fine - tuning large - scale language models (LLMs) for machine translation (MT) tasks. Specifically, the paper attempts to answer the following key questions: 1. **Data volume requirements**: Traditionally, the success of multilingual machine translation depends on a large amount of diverse parallel corpora and high - quality data. However, the paper studies whether the required data volume can be significantly reduced when fine - tuning LLMs, and even whether effective translation performance can be achieved with only 32 parallel sentences. 2. **Impact of a single translation direction**: The paper examines whether fine - tuning in only one translation direction can enable the model to effectively translate language pairs in multiple directions. The study finds that choosing the correct translation direction is crucial, especially avoiding using English as the target language, because this may lead to task misinterpretation. 3. **Impact of synthetic data**: The paper also explores the effect of using low - quality synthetic data (such as data generated by back - translation or word - by - word translation) for fine - tuning. The results show that the quality of synthetic data has a significant impact on model performance. In particular, when noise is introduced on the target language side, it will lead to performance degradation. However, for resource - poor languages, the impact of synthetic data is smaller. ### Main research contents - **Data efficiency**: The impact of different amounts of fine - tuning data (from 32 to 4,096 samples) on translation performance was studied, and it was found that in some cases, a small amount of high - quality parallel data can significantly improve translation results. - **Selection of translation direction**: The effect of fine - tuning in only one translation direction was explored, and the generalization ability between different language pairs was analyzed. The results show that avoiding using English as the target language can prevent task misinterpretation and thus improve translation performance. - **Role of synthetic data**: The impact of low - quality synthetic data (such as back - translation and word - by - word translation) on model performance was evaluated, and it was found that the quality of synthetic data has a greater impact on the target language side, but for resource - poor languages, the model shows stronger robustness. ### Conclusion The main conclusion of the paper is that when fine - tuning LLMs for translation tasks, a small amount of high - quality parallel data can significantly improve translation performance, but it is necessary to pay attention to choosing the appropriate translation direction to avoid task misinterpretation. In addition, the quality of synthetic data has an important impact on model performance. In particular, when the target language has a good representation in pre - training, low - quality synthetic data may lead to performance degradation.

Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?

The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities

Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?

How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes

How Multilingual Are Large Language Models Fine-Tuned for Translation?

Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation

Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs

Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models

Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning

Extrapolating Large Language Models to Non-English by Aligning Languages

I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses