Abstract:Word-level Quality Estimation (QE) of Machine Translation (MT) aims to find out potential translation errors in the translated sentence without reference. Typically, conventional works on word-level QE are designed to predict the translation quality in terms of the post-editing effort, where the word labels ("OK" and "BAD") are automatically generated by comparing words between MT sentences and the post-edited sentences through a Translation Error Rate (TER) toolkit. While the post-editing effort can be used to measure the translation quality to some extent, we find it usually conflicts with the human judgement on whether the word is well or poorly translated. To overcome the limitation, we first create a golden benchmark dataset, namely \emph{HJQE} (Human Judgement on Quality Estimation), where the expert translators directly annotate the poorly translated words on their judgements. Additionally, to further make use of the parallel corpus, we propose the self-supervised pre-training with two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to \emph{HJQE}. We conduct substantial experiments based on the publicly available WMT En-De and En-Zh corpora. The results not only show our proposed dataset is more consistent with human judgment but also confirm the effectiveness of the proposed tag correcting strategies.\footnote{The data can be found at \url{https://github.com/ZhenYangIACAS/HJQE}.}

Target Oriented Data Generation for Quality Estimation of Machine Translation.

Information Dropping Data Augmentation for Machine Translation Quality Estimation

Self-Supervised Quality Estimation for Machine Translation.

Unsupervised Quality Estimation for Neural Machine Translation

Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

DirectQE: Direct Pretraining for Machine Translation Quality Estimation.

A New Tool for Efficiently Generating Quality Estimation Datasets

Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement

From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation

Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation

Beyond Glass-Box Features: Uncertainty Quantification Enhanced Quality Estimation for Neural Machine Translation

QE-EBM: Using Quality Estimators as Energy Loss for Machine Translation

Practical Perspectives on Quality Estimation for Machine Translation

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

Original or Translated? on the Use of Parallel Data for Translation Quality Estimation

Submissions for the WMT 19 Quality Estimation Shared Task

Ensemble-based Transfer Learning for Low-resource Machine Translation Quality Estimation

Quality Estimation of Machine Translated Texts based on Direct Evidence from Training Data

Denoising Pre-training for Machine Translation Quality Estimation with Curriculum Learning.