Filtering Training Corpus and Improving Word Alignment for Statistical Machine Translation

Tiejun Zhao
2013-01-01
Abstract:Word alignment is one of the most important step for statistical machine translation systems.Translation models and reordering models are both built based on word alignment result.The bad influence caused by word alignment error,would still exist in these models,or even become worse.In order to eliminate the word alignment errors,the paper proposes a corpus filtering approach based on alignment perplexity,and also proposes an improved discriminative word alignment algorithm.The corpus filtering approach can omit sentence pairs which contain crucial alignment errors.Compared with the traditional word alignment algorithm,the improved word alignment algorithm can produce alignments with lower alignment error rate.
What problem does this paper attempt to address?