Context-aware and Style-related Incremental Decoding framework for Discourse-Level Literary Translation

Yuanchang Luo,Jiaxin Guo,Daimeng Wei,Hengchao Shang,Zongyao Li,Zhanglin Wu,Zhiqiang Rao,Shaojun Li,Jinlong Yang,Hao Yang
2024-09-29
Abstract:This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track. Translating literary texts poses significant challenges due to the nuanced meanings, idiomatic expressions, and intricate narrative structures inherent in such works. To address these challenges, we leveraged the Chinese-Llama2 model, specifically enhanced for this task through a combination of Continual Pre-training (CPT) and Supervised Fine-Tuning (SFT). Our methodology includes a novel Incremental Decoding framework, which ensures that each sentence is translated with consideration of its broader context, maintaining coherence and consistency throughout the text. This approach allows the model to capture long-range dependencies and stylistic elements, producing translations that faithfully preserve the original literary quality. Our experiments demonstrate significant improvements in both sentence-level and document-level BLEU scores, underscoring the effectiveness of our proposed framework in addressing the complexities of document-level literary translation.
Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the complex problems in literary text translation, especially for the Chinese - English language pair at the discourse - level literary translation. Specifically, the paper focuses on the following key challenges: 1. **Semantic and Stylistic Consistency**: Literary texts are rich in nuanced meanings, idiomatic expressions and complex narrative structures, which pose high requirements for machine translation. Traditional sentence - level translation methods often fail to maintain consistency and coherence at a broader discourse level. 2. **Long - distance Dependencies**: Information in literary works may be introduced early in the text and have an impact in subsequent parts, which challenges the model to capture long - distance dependencies. Traditional models are prone to losing or misusing this context information when dealing with long texts. 3. **Lack of High - quality Parallel Corpora**: High - quality parallel corpora in the literary field are relatively scarce, limiting the learning ability of the model and making it difficult for the model to learn from a large number of diverse examples. To address these challenges, the author proposes a method based on the **Context - aware and Style - related Incremental Decoding framework** and combines the following techniques: - **Continual Pre - training (CPT)**: Use a large amount of monolingual literary data for continual pre - training to enhance the model's understanding of literary texts. - **Supervised Fine - Tuning (SFT)**: Fine - tune the model with specific instructions to ensure that the translation results are not only accurate but also faithful to the literary quality of the original text. - **Incremental Decoding Framework**: Consider the translation results of previous sentences during the translation process to ensure that the translation of each sentence is consistent with its context, thereby improving the overall coherence and consistency of the translation. Through these methods, the author aims to generate translations that are both accurate and faithful to the literary quality of the original text, especially performing well in discourse - level translation tasks. Experimental results show that this method has a significant improvement in both sentence - level and discourse - level BLEU scores, proving its effectiveness and superiority.