MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

Marina Fomicheva,Shuo Sun,Erick Fonseca,Chrysoula Zerva,Frédéric Blain,Vishrav Chaudhary,Francisco Guzmán,Nina Lopatina,Lucia Specia,André F. T. Martins
DOI: https://doi.org/10.48550/arXiv.2010.04480
2021-10-11
Abstract:We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human labels for up to 10,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.
Computation and Language
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the deficiencies of existing Machine Translation (MT) Quality Estimation (QE) and Automatic Post - Editing (APE) datasets. Specifically, these problems include: 1. **Lack of transparent MT models**: Existing QE methods cannot access the internal state or confidence information of the MT system that generates translations, which limits the application of so - called "glass - box" methods. 2. **Single - mode quality assessment**: Current datasets are either based on direct human assessment or on the differences between translations and post - edited texts (such as through HTER or marking words as OK/BAD), but do not include both assessment methods simultaneously, resulting in an unclear correlation between the two. 3. **Uneven resource distribution**: Most existing datasets are concentrated on high - resource language pairs, for which the translation quality is usually high, while there is less data for medium - and low - resource language pairs. In fact, these language pairs need QE assistance more. 4. **Domain limitations**: Existing datasets are mostly concentrated in specific domains (such as IT or life sciences) and use domain - specific MT models for translation, which may lead to high - quality translations of most sentences and thus it is difficult to reflect the challenges in real - life scenarios. To solve the above problems, the authors introduced the MLQE - PE dataset, which is a multilingual quality - assessment and automatic - post - editing dataset, aiming to overcome the limitations of existing datasets and provide more comprehensive and diverse data support for researchers. ### Features of the MLQE - PE dataset - **Open NMT models**: It provides state - of - the - art Neural Machine Translation (NMT) models used for generating translations, allowing researchers to use the model's uncertainty or internal state for quality assessment. - **Combination of two assessment methods**: It includes both Direct Assessment (DA) and Post - Editing Effort (HTER), so that translation quality can be measured from different perspectives. - **Document - level context**: It contains the Wikipedia article titles where the original sentences are located, allowing for consideration of document - level context when predicting sentence - level or word - level translation quality. - **Coverage of multiple language pairs**: It includes 11 language pairs, covering high - resource, medium - resource, and low - resource language pairs to ensure data diversity and wide applicability. Through these improvements, the MLQE - PE dataset provides more abundant and comprehensive data support for research on machine - translation quality assessment and automatic post - editing.