Refining parallel quality for machine translation

Huimin Gong,Xiangyu Duan,Min Zhang
DOI: https://doi.org/10.1109/IALP.2015.7451519
2015-01-01
Abstract:Parallel sentences are crucial for training machine translation systems, but they are not always parallel regarding translation quality or parallel granularity. The reasons of non-parallel are from two aspects: one is the automatic sentence alignment, which will generate non-parallel sentences as noises in the parallel corpus; the other is linguistic difference between two languages, such as one Chinese sentence (with multiple clauses) is parallel to multiple English sentences, which causes difficulty to define the parallel granularity between two languages. We propose methods for attacking these two non-parallel aspects respectively to improve parallel quality of the parallel corpus. The experiments show that the parallel corpus processed by our methods can benefit the training of machine translation system, and improve 1.13 BLEU points over the system that does not use our methods.
What problem does this paper attempt to address?