A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization

Tharindu Kumarage,Garima Agrawal,Paras Sheth,Raha Moraffah,Aman Chadha,Joshua Garland,Huan Liu

2024-03-02

Abstract:We have witnessed lately a rapid proliferation of advanced Large Language Models (LLMs) capable of generating high-quality text. While these LLMs have revolutionized text generation across various domains, they also pose significant risks to the information ecosystem, such as the potential for generating convincing propaganda, misinformation, and disinformation at scale. This paper offers a review of AI-generated text forensic systems, an emerging field addressing the challenges of LLM misuses. We present an overview of the existing efforts in AI-generated text forensics by introducing a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization. These pillars enable a practical understanding of AI-generated text, from identifying AI-generated content (detection), determining the specific AI model involved (attribution), and grouping the underlying intents of the text (characterization). Furthermore, we explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem this paper attempts to address is: With the rapid development of large language models (LLMs), these models are capable of generating high-quality text, but they also bring significant risks, such as generating persuasive propaganda, false information, and misleading information. These issues pose a threat to the information ecosystem, especially impacting public trust and the foundations of democracy. Therefore, this paper aims to review forensic systems for AI-generated text, an emerging field dedicated to analyzing, understanding, and mitigating the misuse of LLMs. Specifically, the paper focuses on the following aspects: 1. **Detection**: Identifying whether the text is generated by humans or AI, which is a fundamental step in protecting the integrity of information. 2. **Attribution**: Tracing the specific source model of AI-generated content to enhance transparency and accountability. 3. **Characterization**: Understanding the intent behind AI-generated text, which is crucial for preventing harmful content. Through these three main pillars, the paper provides a practical understanding of AI-generated text and explores existing research resources, challenges faced, and future development directions. This review aims to organize current research efforts, identify gaps in research, and promote the further development of forensic systems for AI-generated text, thereby fostering a more robust, transparent, and responsible digital information ecosystem.

A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Can AI-Generated Text be Reliably Detected?

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

On the Possibilities of AI-Generated Text Detection

Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index

Generative AI Text Classification using Ensemble LLM Approaches

Exploring AI Text Generation, Retrieval-Augmented Generation, and Detection Technologies: a Comprehensive Overview

The imitation game: Detecting human and AI-generated texts in the era of ChatGPT and BARD

Neural Authorship Attribution: Stylometric Analysis on Large Language Models

A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions

The Science of Detecting LLM-Generated Texts

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

Stylometric Detection of AI-Generated Text in Twitter Timelines

Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts

Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities

StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis

Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions