A Non-Contiguous Tree Sequence Alignment-Based Model for Statistical Machine Translation

Jun Sun,Min Zhang,Chew Lim Tan
DOI: https://doi.org/10.3115/1690219.1690275
2009-01-01
Abstract:The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of subtrees. This paper goes further to present a translation model based on non-contiguous tree sequence alignment, where a non-contiguous tree sequence is a sequence of sub-trees and gaps. Compared with the contiguous tree sequence-based model, the proposed model can well handle non-contiguous phrases with any large gaps by means of non-contiguous tree sequence alignment. An algorithm targeting the noncontiguous constituent decoding is also proposed. Experimental results on the NIST MT-05 Chinese-English translation task show that the proposed model statistically significantly outperforms the baseline systems.
What problem does this paper attempt to address?