Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

Ye Zhang,Qian Leng,Mengran Zhu,Rui Ding,Yue Wu,Jintong Song,Yulu Gong

2024-06-01

Abstract:The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF techniques with advanced machine learning models, including Bayesian classifiers, Stochastic Gradient Descent (SGD), Categorical Gradient Boosting (CatBoost), and 12 instances of Deberta-v3-large models. Our approach aims to address the challenges associated with detecting AI-generated text by leveraging the strengths of both traditional feature extraction methods and state-of-the-art deep learning models. Through extensive experiments on a comprehensive dataset, we demonstrate the effectiveness of our proposed method in accurately distinguishing between human and AI-generated text. Our approach achieves superior performance compared to existing methods. This research contributes to the advancement of AI-generated text detection techniques and lays the foundation for developing robust solutions to mitigate the challenges posed by AI-generated content.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The aim of this paper is to address the problem of detecting text generated by artificial intelligence, in response to the challenge of information authenticity brought about by AI-generated content. Specifically, with the development of large language models (such as GPT-3 and DeBERTa), AI-generated text is becoming increasingly difficult to distinguish from human-generated content. This phenomenon has profound impacts in various fields such as news, social media, education, and business, while also posing risks of manipulating public opinion and eroding trust in digital communication channels. To solve this problem, the researchers propose an innovative hybrid approach that combines traditional TF-IDF feature extraction techniques with advanced machine learning algorithms, including Bayesian classifiers, Stochastic Gradient Descent (SGD), CatBoost, and 12 instances of DeBERTa-v3-large models. Through this method, the researchers demonstrate its effectiveness in distinguishing between human and AI-generated text, and experimental results show that this method outperforms existing techniques in terms of accuracy. This study contributes to the development of AI-generated text detection technology and lays the foundation for mitigating the challenges brought by AI-generated content.

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

On the Possibilities of AI-Generated Text Detection

Accurate Generated Text Detection Based on Deep Layer-wise Relevance Propagation

Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning

Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text

Detecting AI Generated Text Based on NLP and Machine Learning Approaches

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

A Simple yet Efficient Ensemble Approach for AI-generated Text Detection

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack

Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text

DeepTextMark: A Deep Learning-Driven Text Watermarking Approach for Identifying Large Language Model Generated Text

MAGE: Machine-generated Text Detection in the Wild

Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models

Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization

Can AI-Generated Text be Reliably Detected?

Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction

Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement