Aligning Sentences Between Comparable Texts of Different Styles

Xiwen Chen,Mengxue Zhang,Kenny Qili Zhu
DOI: https://doi.org/10.1007/978-981-15-3412-6_6
2020-01-01
Abstract:Monolingual parallel corpus is crucial for training and evaluating text rewriting or paraphrasing models. Aligning parallel sentences between two large body of texts is a key step toward automatic construction of such parallel corpora. We propose a greedy alignment algorithm that makes use of strong unsupervised similarity measures. The algorithm aligns sentences with state-of-the-art accuracy while being more robust on corpora with special linguistic features. Using this alignment algorithm, we automatically constructed a large English parallel corpus from various translated works of classic literature.
What problem does this paper attempt to address?