Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving

Andong Chen,Lianzhang Lou,Kehai Chen,Xuefeng Bai,Yang Xiang,Muyun Yang,Tiejun Zhao,Min Zhang
2024-12-30
Abstract:Different from the traditional translation tasks, classical Chinese poetry translation requires both adequacy and fluency in translating culturally and historically significant content and linguistic poetic elegance. Large language models (LLMs) with impressive multilingual capabilities may bring a ray of hope to achieve this extreme translation demand. This paper first introduces a suitable benchmark (PoetMT) where each Chinese poetry has a recognized elegant translation. Meanwhile, we propose a new metric based on GPT-4 to evaluate the extent to which current LLMs can meet these demands. Our empirical evaluation reveals that the existing LLMs fall short in the challenging task. Hence, we propose a Retrieval-Augmented Machine Translation (RAT) method which incorporates knowledge related to classical poetry for advancing the translation of Chinese Poetry in LLMs. Experimental results show that RAT consistently outperforms all comparison methods regarding wildly used BLEU, COMET, BLEURT, our proposed metric, and human evaluation.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: whether existing large - language models (LLMs) can meet the high - standard requirements of "accuracy, fluency, and elegance" when translating classical Chinese poetry. Specifically, the paper focuses on the following points: 1. **Adequacy**: Ensure that the translation can accurately convey the historical and cultural background as well as the semantic content of the original text. 2. **Fluency**: Ensure that the translated poetry is natural and fluent in language expression and conforms to the grammar and idiomatic usage of the target language. 3. **Elegance**: Preserve the poetic beauty of the original text, including rhythm, structure, and concise language style. ### Research Background Classical Chinese poetry has a profound cultural and historical background and follows strict rhythm and structure rules. Therefore, accurately, fluently, and elegantly translating it into English is an extremely challenging task. Although existing large - language models (such as ChatGPT) perform well in multilingual capabilities, they still have deficiencies when dealing with this special type of translation. ### Solutions To evaluate and improve the performance of existing LLMs in this task, the author makes the following contributions: 1. **Constructing a benchmark dataset (PoetMT)**: - The author created a benchmark dataset named PoetMT, which contains 608 classic Chinese poems that have been professionally translated and their corresponding English translations. These poems cover works from the Tang, Song, and Yuan dynasties. 2. **Proposing a new evaluation metric**: - Developed a new automatic evaluation metric based on GPT - 4 to evaluate the translation quality from three dimensions of accuracy, fluency, and elegance. This new metric is more in line with the characteristics of classical poetry than traditional ones such as BLEU and COMET. 3. **Introducing a retrieval - enhanced machine translation method (RAT)**: - Proposed a retrieval - enhanced machine translation method (Retrieval - Augmented Translation, RAT), which improves the translation effect by combining the knowledge base of classic poetry. The RAT method first retrieves information related to the poem to be translated from the knowledge base and then uses this information to generate the final translation result. ### Experimental Results Experiments show that existing LLMs do have deficiencies when translating classical Chinese poetry, especially in maintaining the rhythm and structure of the poetry. However, the RAT method performs excellently in this task, significantly outperforming other comparison methods and obtaining higher scores on multiple evaluation metrics. In addition, the experiment also found that using modern Chinese translation knowledge is of great help in improving translation quality, but relying solely on modern Chinese translation cannot completely solve the problem and still needs to combine multiple sources of knowledge. ### Summary This research not only provides new tools and methods for evaluating and improving the performance of LLMs in classical Chinese poetry translation but also points out the direction for future research, that is, how to better combine domain - specific knowledge to improve the effect of machine translation.