Abstract:Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles a primal task and a dual task simultaneously with weight sharing, leading to stronger context prediction ability than single-direction NMT models. By taking advantage of the dual learning scheme, we further design a novel feature to directly encode the translated target information without relying on the source context. Extensive experiments conducted on WMT20 QE tasks demonstrate that our method beats the winner of the competition and outperforms other baseline methods by a great margin. We further use the sentence-level scores provided by Verdi to clean a parallel corpus and observe benefits on both model performance and training efficiency.

Removing Input Confounder for Translation Quality Estimation via a Causal Motivated Method

DENOISPEECH: DENOISING TEXT TO SPEECH WITH FRAME-LEVEL NOISE MODELING

Denoising Pre-training for Machine Translation Quality Estimation with Curriculum Learning.

Translation Error Detection as Rationale Extraction

NJUNLP's Submission for CCMT20 Quality Estimation Task.

Multi-view fusion for universal translation quality estimation

DirectQE: Direct Pretraining for Machine Translation Quality Estimation.

Beyond Glass-Box Features: Uncertainty Quantification Enhanced Quality Estimation for Neural Machine Translation

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

Information Dropping Data Augmentation for Machine Translation Quality Estimation

Better Simultaneous Translation with Monotonic Knowledge Distillation.

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model

Debiasing NLU Models via Causal Intervention and Counterfactual Reasoning

Unsupervised Quality Estimation for Neural Machine Translation

Self-Supervised Quality Estimation for Machine Translation.

Submissions for the WMT 19 Quality Estimation Shared Task

Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation

DAmcqrnn: An approach to censored monotone composite quantile regression neural network estimation

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

On the Impact of Noises in Crowd-Sourced Data for Speech Translation

DeepSubQE: Quality estimation for subtitle translations