Abstract:This paper explores the use of general-purpose machine translation (MT) in assisting the users of computer-aided translation (CAT) systems based on translation memory (TM) to identify the target words in the translation proposals that need to be changed (either replaced or removed) or kept unedited, a task we term as "word-keeping recommendation". MT is used as a black box to align source and target sub-segments on the fly in the translation units (TUs) suggested to the user. Source-language (SL) and target-language (TL) segments in the matching TUs are segmented into overlapping sub-segments of variable length and machine-translated into the TL and the SL, respectively. The bilingual sub-segments obtained and the matching between the SL segment in the TU and the segment to be translated are employed to build the features that are then used by a binary classifier to determine the target words to be changed and those to be kept unedited. In this approach, MT results are never presented to the translator. Two approaches are presented in this work: one using a word-keeping recommendation system which can be trained on the TM used with the CAT system, and a more basic approach which does not require any training. Experiments are conducted by simulating the translation of texts in several language pairs with corpora belonging to different domains and using three different MT systems. We compare the performance obtained to that of previous works that have used statistical word alignment for word-keeping recommendation, and show that the MT-based approaches presented in this paper are more accurate in most scenarios. In particular, our results confirm that the MT-based approaches are better than the alignment-based approach when using models trained on out-of-domain TMs. Additional experiments were performed to check how dependent the MT-based recommender is on the language pair and MT system used for training. These experiments confirm a high degree of reusability of the recommendation models across various MT systems, but a low level of reusability across language pairs.

Revisiting Machine Translation for Cross-lingual Classification

To Translate or Not to Translate: A Systematic Investigation of Translation-Based Cross-Lingual Transfer to Low-Resource Languages

Using Machine Translation to Augment Multilingual Classification

T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification

Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models

Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

Hybrid Translation with Classification: Revisiting Rule-Based and Neural Machine Translation

Towards Red Teaming in Multimodal and Multilingual Translation

Translation Artifacts in Cross-lingual Transfer Learning

The Impact of Indirect Machine Translation on Sentiment Classification

Assessing Crosslingual Discourse Relations in Machine Translation

Machine Translation: A Literature Review

Beyond English-Centric Multilingual Machine Translation

Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories

On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?

Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning?

Fine Grained Human Evaluation for English-to-Chinese Machine Translation: A Case Study on Scientific Text

Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

Building Machine Translation Systems for the Next Thousand Languages

A Study of Multilingual Neural Machine Translation

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation