Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models

Jianhui Pang,Fanghua Ye,Longyue Wang,Dian Yu,Derek F. Wong,Shuming Shi,Zhaopeng Tu
2024-01-17
Abstract:The evolution of Neural Machine Translation (NMT) has been significantly influenced by six core challenges (Koehn and Knowles, 2017), which have acted as benchmarks for progress in this field. This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch, amount of parallel data, rare word prediction, translation of long sentences, attention model as word alignment, and sub-optimal beam search. Our empirical findings indicate that LLMs effectively lessen the reliance on parallel data for major languages in the pretraining phase. Additionally, the LLM-based translation system significantly enhances the translation of long sentences that contain approximately 80 words and shows the capability to translate documents of up to 512 words. However, despite these significant improvements, the challenges of domain mismatch and prediction of rare words persist. While the challenges of word alignment and beam search, specifically associated with NMT, may not apply to LLMs, we identify three new challenges for LLMs in translation tasks: inference efficiency, translation of low-resource languages in the pretraining phase, and human-aligned evaluation. The datasets and models are released at
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: to re - examine the six classic challenges faced by large - language models (LLMs) in machine translation (MT) tasks, and whether these challenges still exist or have changed in the current context. Specifically, these six classic challenges include: 1. **Domain Mismatch**: - Problem description: Texts in different domains have differences in terms, styles, etc., which lead to a decline in the performance of translation systems when dealing with cross - domain texts. - Research findings: Although LLMs are exposed to a large amount of diverse data during the pre - training stage, when dealing with domain - specific texts, they still face problems such as term mismatches, style differences, and hallucination phenomena. 2. **Amount of Parallel Data**: - Problem description: Traditional neural machine translation (NMT) systems rely on a large amount of parallel corpora for training. Do LLMs still need a large amount of parallel data? - Research findings: LLMs reduce the dependence on parallel data of high - resource languages. A small amount of high - quality parallel data can significantly improve translation performance. However, too much parallel data may lead to a decline in performance instead. 3. **Rare Word Prediction**: - Problem description: How to accurately predict and translate rare words, such as proper nouns, compound words, etc. - Research findings: LLMs perform well in predicting high - frequency words, but for rare words that appear less than 8 times, their precision is low and the deletion rate is high. 4. **Translation of Long Sentences**: - Problem description: The translation of long sentences requires accurately capturing context information, which places higher requirements on the understanding ability of translation systems. - Research findings: LLMs perform excellently in translating long sentences (about 80 words) and document - level translation (up to 512 words), far exceeding traditional NMT models. 5. **Word Alignment**: - Problem description: Extract the word - alignment relationship between the source language and the target language through the attention mechanism to explain the working principle of the translation model. - Research findings: It is not feasible to extract word - alignment information from the attention weights of LLMs, but the aggregated attention weights can be used as clues to explain LLMs. 6. **Inference Efficiency**: - Problem description: The influence of the strategies (such as beam search and sampling) used in the inference process and inference efficiency on translation quality. - Research findings: Beam search is superior to sampling in BLEU score, but when dealing with rare words, sampling performs better. In addition, the inference efficiency of LLMs is much lower than that of traditional NMT models, resulting in an increase in latency. In addition, the paper also points out three new challenges: - Inference Efficiency - Pretraining Resource Imbalance for Low - Resource Languages - Human - Aligned Evaluation By re - examining these challenges, the paper provides valuable insights and directions for future research.