Comparative Study of Word Alignment Heuristics and Phrase-Based SMT

Hua Wu,Haifeng Wang
2007-01-01
Abstract:This paper comparatively analyzes six different word alignment heuristics and their impacts on translation quality. We also propose a method to filter the noise in the phrase tables extracted by these heuristic methods and examine the effectiveness of combination of the methods. Experiments are performed on the Europarl corpus, where a multilingual in-domain training corpus, an in-domain test set, and an out-of-domain test set are available. Results indicate that (1) the heuristics show similar tendencies in the word alignment task on both test sets, but they perform differently in the translation task on the in-domain and out-of-domain test sets; (2) in general, the relationship between word alignment and machine translation performance is difficult to be predicted, depending on domains of the training and testing corpora besides other factors such as evaluation metrics and the characteristics of translation systems; (3) noise filtering and combination of these heuristic methods achieve larger improvement on the out-of-domain test set than on the in-domain test set.
What problem does this paper attempt to address?