Target Oriented Data Generation for Quality Estimation of Machine Translation.

Huanqin Wu,Muyun Yang,Jiaqi Wang,Junguo Zhu,Tiejun Zhao
DOI: https://doi.org/10.1007/978-3-030-32233-5_31
2019-01-01
Abstract:Quality estimation (QE) is a non-trivial issue for machine translation (MT) and the neural approach appears a promising solution to this task. Annotating QE training corpora is a costly process but necessary for supervised QE systems. To provide informative large scale training data for the MT quality estimation model, this paper proposes an approach to generate pseudo QE training data. By leveraging the provided labeled corpus in this task, our method generates pseudo training samples with a purpose of similar distribution of translation error of the labeled corpus. It also describes a sentence specific data expansion strategy to incrementally boost the model performance. The experiments on the different open datasets and models confirm the effectiveness of the method, and indicate that our proposed method can significantly improve the QE performance.
What problem does this paper attempt to address?