SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Yuxia Wang,Jonibek Mansurov,Petar Ivanov,Jinyan Su,Artem Shelmanov,Akim Tsvigun,Osama Mohammed Afzal,Tarek Mahmoud,Giovanni Puccetti,Thomas Arnold,Chenxi Whitehouse,Alham Fikri Aji,Nizar Habash,Iryna Gurevych,Preslav Nakov
2024-04-22
Abstract:We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual track. Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM. Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine. The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30). In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For all subtasks, the best systems used LLMs.
Computation and Language
What problem does this paper attempt to address?
This paper proposes a task named SemEval-2024 Task 8, which focuses on detecting machine-generated text in a multi-source, multi-domain, and multi-language context. The task consists of three subtasks: A) binary classification, determining whether the text is generated by humans or machines; B) precise source detection, identifying whether the text is specifically generated by humans or a specific large language model (LLM); C) change point detection, finding the transition point in the text where the author's identity changes from human to machine. These tasks aim to address the misuse issues caused by the widespread use of LLMs, ensure information accuracy, and promote the development of machine-generated text detection technology. There are numerous participating teams, with 126 teams participating in the monolingual task, 59 teams participating in the multilingual task, 70 teams participating in the source detection task, and 30 teams participating in the change point detection task. The best performing systems for each subtask use LLMs. The research analyzes various methods, including supervised and unsupervised techniques, and provides a large amount of evaluation datasets. Subtask A (monolingual and multilingual binary classification) focuses on distinguishing between human and machine-generated text and involves datasets in English and multiple languages. Subtask B (multi-source detection) aims to determine the specific source of the text, whether it is human or a specific LLM. Subtask C (change point detection) aims to identify the exact location in the text where the transition from human to machine authorship occurs, dealing with mixed human and machine-generated text. The paper analyzes the methods submitted by various systems, discusses the results, and provides directions for future research.