Chinese Ancient-Modern Sentence Alignment

Zhun Lin,Xiaojie Wang
DOI: https://doi.org/10.1007/978-3-540-72586-2_164
2007-01-01
Abstract:Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. Most of previous researches are on different language pairs. This paper presents a diachronic alignment of Ancient and Modern Chinese. Because of the long history of Chinese culture and Chinese writing, lots of Ancient Chinese texts are waiting to be translated into modern Chinese, especially, the comparative study of Ancient and Modern Chinese is a very important way to understand some characteristics in Modern Chinese. After describing some characteristics in Ancient-Modern Chinese bi-texts, we first investigate some statistical properties of Ancient-Modern bi-text corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n1 or m1) alignment modes which are prone to mismatch.
What problem does this paper attempt to address?