Generating Virtual Parallel Corpus - A Compatibility Centric Method.

Jia Xu,Weiwei Sun
2011-01-01
Abstract:The processing of many natural languages suffers from scarce linguistic resources. We introduce the idea of compatibility to extend training data for machine translation: If translation hypotheses by multiple systems are measured as compatible, they are considered as reliable predictions. By this way, we generate virtual parallel data per bridge language, and re-compiling on this corpus improves our machine translation quality by more than 30% relatively.
What problem does this paper attempt to address?