Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training.

Xinyan Xiao,Yang Liu,Qun Liu,Shouxun Lin
2011-01-01
Abstract:Although discriminative training guarantees to improve statistical machine translation by incorporating a large amount of overlapping features, it is hard to scale up to large data due to decoding complexity. We propose a new algorithm to generate translation forest of training data in linear time with the help of word alignment. Our algorithm also alleviates the oracle selection problem by ensuring that a forest always contains derivations that exactly yield the reference translation. With millions of features trained on 519K sentences in 0.03 second per sentence, our system achieves significant improvement by 0.84 Bleu over the baseline system on the NIST Chinese-English test sets.
What problem does this paper attempt to address?