Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection

Teodor-George Marchitan,Claudiu Creanga,Liviu P. Dinu
2024-05-28
Abstract:This paper describes the approach of the UniBuc - NLP team in tackling the SemEval 2024 Task 8: Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection. We explored transformer-based and hybrid deep learning architectures. For subtask B, our transformer-based model achieved a strong \textbf{second-place} out of $77$ teams with an accuracy of \textbf{86.95\%}, demonstrating the architecture's suitability for this task. However, our models showed overfitting in subtask A which could potentially be fixed with less fine-tunning and increasing maximum sequence length. For subtask C (token-level classification), our hybrid model overfit during training, hindering its ability to detect transitions between human and machine-generated text.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of distinguishing between human-generated text and AI-generated text. Specifically, the research team (UniBuc-NLP) participated in SemEval 2024 Task 8, which is a multi-generator, multi-domain, and multi-language black-box machine-generated text detection challenge. By developing tools capable of identifying the differences between these two types of text, it is possible to maintain the authenticity and integrity of information, prevent the spread of misinformation, and ensure the traceability of content sources. This is crucial for combating unethical AI uses such as propaganda, misinformation, deepfakes, and social manipulation. The research team employed Transformer-based and hybrid deep learning architectures to tackle different subtasks. For Subtask B, their Transformer-based model achieved 2nd place with an accuracy of 86.95%, demonstrating the suitability of this architecture for the task. However, in Subtask A, the model experienced overfitting; and in Subtask C (token-level classification), the hybrid model also encountered overfitting during training, which affected its ability to detect transitions between human and machine-generated text.