Abstract:Translated texts exhibit systematic linguistic differences compared to original texts in the same language, and these differences are referred to as translationese. Translationese has effects on various cross-lingual natural language processing tasks, potentially leading to biased results. In this paper, we explore a novel approach to reduce translationese in translated texts: translation-based style transfer. As there are no parallel human-translated and original data in the same language, we use a self-supervised approach that can learn from comparable (rather than parallel) mono-lingual original and translated data. However, even this self-supervised approach requires some parallel data for validation. We show how we can eliminate the need for parallel validation data by combining the self-supervised loss with an unsupervised loss. This unsupervised loss leverages the original language model loss over the style-transferred output and a semantic similarity loss between the input and style-transferred output. We evaluate our approach in terms of original vs. translationese binary classification in addition to measuring content preservation and target-style fluency. The results show that our approach is able to reduce translationese classifier accuracy to a level of a random classifier after style transfer while adequately preserving the content and fluency in the target original style.

What problem does this paper attempt to address?

This paper aims to address the issue of "translationese" in translated texts. Translationese refers to the systematic linguistic differences that translated texts exhibit compared to original works in the same language. These differences can affect the performance of cross-lingual natural language processing tasks, leading to biased results. To tackle this problem, the researchers propose a novel method—Translation-based Style Transfer, which can reduce translationese in translated texts without parallel data. Specifically, this method employs a self-supervised neural machine translation system and applies it to the style transfer task. Due to the lack of parallel human translation and original data, the researchers further propose a joint self-supervised and unsupervised learning criterion, which combines language model loss and semantic similarity loss, thereby eliminating the need for parallel data during training and validation. The main contributions of this method include: 1. For the first time, framing the reduction of translationese in human-translated texts as a monolingual translation style transfer task, allowing for direct evaluation of the surface form of the generated output. 2. Introducing a joint self-supervised and unsupervised learning criterion, which does not require parallel original-translation datasets for training and validation. 3. Experimental results show that this method can significantly reduce the accuracy of translationese classifiers to the level of random classifiers, indicating that the method successfully eliminates translationese signals in the output. 4. Providing extensive quantitative and qualitative analyses to assess the method's ability to mitigate translationese while maintaining content integrity and fluency. In summary, this paper proposes an innovative approach to mitigate the issue of translationese in translated texts, which is significant for improving the performance of cross-lingual natural language processing tasks.

Translating away Translationese without Parallel Data

UATST: Towards Unpaired Arbitrary Text-Guided Style Transfer with Cross-Space Modulation

Original or Translated? on the Use of Parallel Data for Translation Quality Estimation

Language Style Transfer from Non-Parallel Text with Arbitrary Styles

Style Transfer as Unsupervised Machine Translation

So Different Yet So Alike! Constrained Unsupervised Text Style Transfer

Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

Transductive Learning for Unsupervised Text Style Transfer

Non-Parallel Text Style Transfer Using Self-Attentional Discriminator As Supervisor

A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer

Utilizing Non-Parallel Text for Style Transfer by Making Partial Comparisons.

Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus

StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse Representations and Content Enhancing

Lost in Translationese? Reducing Translation Effect Using Abstract Meaning Representation

Language Style Transfer from Sentences with Arbitrary Unknown Styles

Lost in Machine Translation: A Method to Reduce Meaning Loss

Translationese in Machine Translation Evaluation

Style Transfer in Text: Exploration and Evaluation

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer

Style Transfer with Multi-iteration Preference Optimization

Contextual Text Style Transfer