Abstract:This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track. Translating literary texts poses significant challenges due to the nuanced meanings, idiomatic expressions, and intricate narrative structures inherent in such works. To address these challenges, we leveraged the Chinese-Llama2 model, specifically enhanced for this task through a combination of Continual Pre-training (CPT) and Supervised Fine-Tuning (SFT). Our methodology includes a novel Incremental Decoding framework, which ensures that each sentence is translated with consideration of its broader context, maintaining coherence and consistency throughout the text. This approach allows the model to capture long-range dependencies and stylistic elements, producing translations that faithfully preserve the original literary quality. Our experiments demonstrate significant improvements in both sentence-level and document-level BLEU scores, underscoring the effectiveness of our proposed framework in addressing the complexities of document-level literary translation.

What problem does this paper attempt to address?

This paper attempts to solve the complex problems in literary text translation, especially for the Chinese - English language pair at the discourse - level literary translation. Specifically, the paper focuses on the following key challenges: 1. **Semantic and Stylistic Consistency**: Literary texts are rich in nuanced meanings, idiomatic expressions and complex narrative structures, which pose high requirements for machine translation. Traditional sentence - level translation methods often fail to maintain consistency and coherence at a broader discourse level. 2. **Long - distance Dependencies**: Information in literary works may be introduced early in the text and have an impact in subsequent parts, which challenges the model to capture long - distance dependencies. Traditional models are prone to losing or misusing this context information when dealing with long texts. 3. **Lack of High - quality Parallel Corpora**: High - quality parallel corpora in the literary field are relatively scarce, limiting the learning ability of the model and making it difficult for the model to learn from a large number of diverse examples. To address these challenges, the author proposes a method based on the **Context - aware and Style - related Incremental Decoding framework** and combines the following techniques: - **Continual Pre - training (CPT)**: Use a large amount of monolingual literary data for continual pre - training to enhance the model's understanding of literary texts. - **Supervised Fine - Tuning (SFT)**: Fine - tune the model with specific instructions to ensure that the translation results are not only accurate but also faithful to the literary quality of the original text. - **Incremental Decoding Framework**: Consider the translation results of previous sentences during the translation process to ensure that the translation of each sentence is consistent with its context, thereby improving the overall coherence and consistency of the translation. Through these methods, the author aims to generate translations that are both accurate and faithful to the literary quality of the original text, especially performing well in discourse - level translation tasks. Experimental results show that this method has a significant improvement in both sentence - level and discourse - level BLEU scores, proving its effectiveness and superiority.

Context-aware and Style-related Incremental Decoding framework for Discourse-Level Literary Translation

Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs

Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models

DeMPT: Decoding-enhanced Multi-phase Prompt Tuning for Making LLMs Be Better Context-aware Translators

Seeing Various Adventures Through a Mirror: Detecting Translator's Stylistic Visibility in Chinese Translations of Alice's Adventure in Wonderland

Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding

Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

When Does Translation Require Context? A Data-driven, Multilingual Exploration

Modeling Context With Linear Attention for Scalable Document-Level Translation

Exploring the traditional NMT model and Large Language Model for chat translation

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Leveraging Discourse Rewards for Document-Level Neural Machine Translation

Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Exploring Discourse Structure in Document-level Machine Translation

Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach

A Novel Paradigm Boosting Translation Capabilities of Large Language Models

Diving Deep into Context-Aware Neural Machine Translation

Modeling Coherence for Discourse Neural Machine Translation

FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding.

Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task

A Context-aware Framework for Translation-mediated Conversations