Abstract:Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations. In this work, we explore an unsupervised approach to translation direction detection based on the simple hypothesis that $p(\text{translation}|\text{original})>p(\text{original}|\text{translation})$, motivated by the well-known simplification effect in translationese or machine-translationese. In experiments with massively multilingual machine translation models across 20 translation directions, we confirm the effectiveness of the approach for high-resource language pairs, achieving document-level accuracies of 82--96% for NMT-produced translations, and 60--81% for human translations, depending on the model used. Code and demo are available at

What problem does this paper attempt to address?

The paper aims to address the problem of translation direction detection. Specifically, it explores an unsupervised method based on neural machine translation (NMT) models to automatically determine the original translation direction of parallel texts. This method is significant for machine translation training, evaluation, and forensic applications such as resolving plagiarism or forgery accusations. The main contributions of the paper include: 1. **Proposing a simple and unsupervised learning method**: This method identifies the translation direction based on the translation probabilities of the NMT model. It assumes that the original text is more difficult to translate than its translated version, i.e., $p(\text{translation}|\text{original}) > p(\text{original}|\text{translation})$. 2. **Demonstrating the effectiveness of the method**: Experiments show that this method can effectively detect the original translation direction of texts generated by neural machine translation and is also somewhat applicable to human translations. 3. **Conducting qualitative analysis**: The authors performed a qualitative analysis of the detection performance and applied the method to a real forensic case to verify whether an English book was a forgery, thereby supporting specific hypotheses about the book's authenticity. 4. **Exploring the issue of directional bias**: The paper discusses the potential directional bias in multilingual translation models and proposes a method to quantify the bias. Experimental results show that for texts generated by neural machine translation systems, the detection accuracy of this method is relatively high; for human translations, the accuracy is slightly lower but still significant; and for outputs from early non-neural network translation systems, the accuracy is below random levels, possibly due to the lower quality of texts generated by these systems. Additionally, the study demonstrates the performance differences of different models across various language pairs and finds that document-level translation direction detection accuracy is generally higher than sentence-level. Finally, the paper validates the practical application value of the method through a specific case study.

Machine Translation Models are Zero-Shot Detectors of Translation Direction

On the Influence of Machine Translation on Language Origin Obfuscation

Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance

Detecting Machine-Translated Text using Back Translation

Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models

Language Tags Matter for Zero-Shot Neural Machine Translation

The Missing Ingredient in Zero-Shot Neural Machine Translation

Towards Zero-Shot Multimodal Machine Translation

ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination

Improving Zero-Shot Translation of Low-Resource Languages

One Model to Learn Both: Zero Pronoun Prediction and Translation

Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

A Study of Multilingual Neural Machine Translation

Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models

How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data

On Learning Language-Invariant Representations for Universal Machine Translation

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

Exploring Human-Like Translation Strategy with Large Language Models