A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization

Tharindu Kumarage,Garima Agrawal,Paras Sheth,Raha Moraffah,Aman Chadha,Joshua Garland,Huan Liu
2024-03-02
Abstract:We have witnessed lately a rapid proliferation of advanced Large Language Models (LLMs) capable of generating high-quality text. While these LLMs have revolutionized text generation across various domains, they also pose significant risks to the information ecosystem, such as the potential for generating convincing propaganda, misinformation, and disinformation at scale. This paper offers a review of AI-generated text forensic systems, an emerging field addressing the challenges of LLM misuses. We present an overview of the existing efforts in AI-generated text forensics by introducing a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization. These pillars enable a practical understanding of AI-generated text, from identifying AI-generated content (detection), determining the specific AI model involved (attribution), and grouping the underlying intents of the text (characterization). Furthermore, we explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is: With the rapid development of large language models (LLMs), these models are capable of generating high-quality text, but they also bring significant risks, such as generating persuasive propaganda, false information, and misleading information. These issues pose a threat to the information ecosystem, especially impacting public trust and the foundations of democracy. Therefore, this paper aims to review forensic systems for AI-generated text, an emerging field dedicated to analyzing, understanding, and mitigating the misuse of LLMs. Specifically, the paper focuses on the following aspects: 1. **Detection**: Identifying whether the text is generated by humans or AI, which is a fundamental step in protecting the integrity of information. 2. **Attribution**: Tracing the specific source model of AI-generated content to enhance transparency and accountability. 3. **Characterization**: Understanding the intent behind AI-generated text, which is crucial for preventing harmful content. Through these three main pillars, the paper provides a practical understanding of AI-generated text and explores existing research resources, challenges faced, and future development directions. This review aims to organize current research efforts, identify gaps in research, and promote the further development of forensic systems for AI-generated text, thereby fostering a more robust, transparent, and responsible digital information ecosystem.