Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

Ye Zhang,Qian Leng,Mengran Zhu,Rui Ding,Yue Wu,Jintong Song,Yulu Gong
2024-06-01
Abstract:The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF techniques with advanced machine learning models, including Bayesian classifiers, Stochastic Gradient Descent (SGD), Categorical Gradient Boosting (CatBoost), and 12 instances of Deberta-v3-large models. Our approach aims to address the challenges associated with detecting AI-generated text by leveraging the strengths of both traditional feature extraction methods and state-of-the-art deep learning models. Through extensive experiments on a comprehensive dataset, we demonstrate the effectiveness of our proposed method in accurately distinguishing between human and AI-generated text. Our approach achieves superior performance compared to existing methods. This research contributes to the advancement of AI-generated text detection techniques and lays the foundation for developing robust solutions to mitigate the challenges posed by AI-generated content.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The aim of this paper is to address the problem of detecting text generated by artificial intelligence, in response to the challenge of information authenticity brought about by AI-generated content. Specifically, with the development of large language models (such as GPT-3 and DeBERTa), AI-generated text is becoming increasingly difficult to distinguish from human-generated content. This phenomenon has profound impacts in various fields such as news, social media, education, and business, while also posing risks of manipulating public opinion and eroding trust in digital communication channels. To solve this problem, the researchers propose an innovative hybrid approach that combines traditional TF-IDF feature extraction techniques with advanced machine learning algorithms, including Bayesian classifiers, Stochastic Gradient Descent (SGD), CatBoost, and 12 instances of DeBERTa-v3-large models. Through this method, the researchers demonstrate its effectiveness in distinguishing between human and AI-generated text, and experimental results show that this method outperforms existing techniques in terms of accuracy. This study contributes to the development of AI-generated text detection technology and lays the foundation for mitigating the challenges brought by AI-generated content.