Guwen-UNILM: Machine Translation Between Ancient and Modern Chinese Based on Pre-Trained Models

Zinong Yang,Ke-jia Chen,Jingqiang Chen
DOI: https://doi.org/10.1007/978-3-030-88480-2_10
2021-01-01
Abstract:Ancient Chinese literatures are not only the unique cultural heritage of China but also the treasures of world civilization. Nevertheless, it has become quite difficult for modern people to comprehend or even create ancient works with the evolution of language in the long history. Translation is therefore playing a key role in bridging the two eras. This paper is to develop an automatic translation method between ancient and modern Chinese literature. To start with, an open sourced sentence level parallel corpus of ancient-modern Chinese is established since there is no available parallel corpus open for use. As the seq2seq-based machine translation models do not work well on this task, the pre-trained model UNILM is then applied in our method considering the monolingual characteristics of this task. Furthermore, the ancient Chinese pre-trained model - Guwen-BERT is utilized to further improve the performance of the method. The quality of translation is evaluated by both Human Evaluation and two automatic metrics: a) case-sensitive BLEU scores and b) Imagery Conservation (I.C), which is first developed in this paper. The experimental results under all metrics show that our method can generate higher quality of translation.
What problem does this paper attempt to address?