Towards Non-Monotonic Sentence Alignment

Xiaojun Quan,Chunyu Kit
DOI: https://doi.org/10.1016/j.ins.2015.06.028
IF: 8.1
2015-01-01
Information Sciences
Abstract:All previous works on sentence alignment were founded on the monotonicity assumption that coupled sentences occur in a similar sequential order on the two sides of bilingual parallel corpora (i.e., bitexts), leaving out the non-monotonicity in naturally-occurring bitexts. This paperpresents the very first attempt to specifically address this practical issue in sentence alignment, by taking advantage of two observations: (1) an initial (or seed) alignment can be made available using accessible lexical resources and (2) sentences with high affinity in one language tend to have their counterparts with similar affinity in the other. They are incorporated as two constraints into semisupervised learning to formulate a novel and generalized solution for both monotonic and non-monotonic sentence alignment. Our evaluation on real-world data from two remote domains and an end-to-end MT evaluation show that while representative monotonic aligners suffer more severely from a higher degree of non-monotonicity, our approach is able to maintain a stable and competitive performance across the full spectrum of non-monotonicity.
What problem does this paper attempt to address?