Abstract:This paper explores the use of general-purpose machine translation (MT) in assisting the users of computer-aided translation (CAT) systems based on translation memory (TM) to identify the target words in the translation proposals that need to be changed (either replaced or removed) or kept unedited, a task we term as "word-keeping recommendation". MT is used as a black box to align source and target sub-segments on the fly in the translation units (TUs) suggested to the user. Source-language (SL) and target-language (TL) segments in the matching TUs are segmented into overlapping sub-segments of variable length and machine-translated into the TL and the SL, respectively. The bilingual sub-segments obtained and the matching between the SL segment in the TU and the segment to be translated are employed to build the features that are then used by a binary classifier to determine the target words to be changed and those to be kept unedited. In this approach, MT results are never presented to the translator. Two approaches are presented in this work: one using a word-keeping recommendation system which can be trained on the TM used with the CAT system, and a more basic approach which does not require any training. Experiments are conducted by simulating the translation of texts in several language pairs with corpora belonging to different domains and using three different MT systems. We compare the performance obtained to that of previous works that have used statistical word alignment for word-keeping recommendation, and show that the MT-based approaches presented in this paper are more accurate in most scenarios. In particular, our results confirm that the MT-based approaches are better than the alignment-based approach when using models trained on out-of-domain TMs. Additional experiments were performed to check how dependent the MT-based recommender is on the language pair and MT system used for training. These experiments confirm a high degree of reusability of the recommendation models across various MT systems, but a low level of reusability across language pairs.

The first Automatic Translation Memory Cleaning Shared Task

Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus

Automatic Correction of Human Translations

Translation Memory Retrieval Methods

Netmarble AI Center's WMT21 Automatic Post-Editing Shared Task Submission

A Shared Task on Bandit Learning for Machine Translation

The University of Helsinki submissions to the WMT19 news translation task

Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories

Automated Testing for Machine Translation Via Constituency Invariance

Bilingual Synchronization: Restoring Translational Relationships with Editing Operations

Improving CAT Tools in the Translation Workflow: New Approaches and Evaluation

Findings of the WMT 2024 Shared Task on Chat Translation

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

TSMind: Alibaba and Soochow University's Submission to the WMT22 Translation Suggestion Task

Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task

Findings of the Second Workshop on Automatic Simultaneous Translation

A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation

Report of NEWS 2009 Machine Transliteration Shared Task