Abstract:Large language models (LLMs) have advanced to a point that even humans have difficulty discerning whether a text was generated by another human, or by a computer. However, knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness, and has applications in many domains including detecting fraud and academic dishonesty, as well as combating the spread of misinformation and political propaganda. The task of AI-generated text (AIGT) detection is therefore both very challenging, and highly critical. In this survey, we summarize state-of-the art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper primarily explores the issue of detecting AI-generated text (AIGT) and provides a review and analysis of the current research in this field. Specifically: 1. **Background and Importance**: - With the development of large language models (LLMs), it is becoming increasingly difficult to distinguish whether text is generated by humans or computers. - Determining the source of text is crucial for assessing its credibility, especially in detecting fraud, academic misconduct, and combating the spread of misinformation. 2. **Research Objectives**: - Summarize the current state-of-the-art AIGT detection methods, including watermarking techniques, statistical and stylistic analysis, and machine learning classification. - Provide information on existing datasets and synthesize research findings to reveal key factors affecting the detectability of AIGT. - Offer practical recommendations for future work to address this significant technical and social challenge. 3. **Main Content**: - **Task Definition**: Clarify the task of AIGT detection and discuss its key characteristics. - **Classification**: Categorize AIGT into different types ranging from fully automated to highly human-intervened. - **Detection Scenarios**: Describe different types of detection scenarios and their differences. - **Method Overview**: Introduce current NLP methods, divided into watermarking techniques, statistical and stylistic analysis, and pre-trained language model classification. - **Datasets**: List existing datasets available for training and testing AIGT detection systems. - **Influencing Factors**: Discuss various factors affecting the difficulty of AIGT detection, such as characteristics of the generation model, text length, adversarial strategies, etc. - **Conclusions and Recommendations**: Summarize research findings and propose suggestions for future research directions. Through this content, the paper aims to provide researchers and technical practitioners with a comprehensive guide to help them choose the most appropriate detection methods and training datasets for specific applications. As LLMs become more prevalent in daily life, AIGT detection will become an important issue that requires collaborative efforts to solve.

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Can AI-Generated Text be Reliably Detected?

A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization

On the Possibilities of AI-Generated Text Detection

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

Detecting AI Generated Text Based on NLP and Machine Learning Approaches

Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index

Testing of Detection Tools for AI-Generated Text

The imitation game: Detecting human and AI-generated texts in the era of ChatGPT and BARD

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

An Empirical Study of AI Generated Text Detection Tools

Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Stylometric Detection of AI-Generated Text in Twitter Timelines

A Practical Examination of AI-Generated Text Detectors for Large Language Models

Detection of Machine-Generated Text: Literature Survey

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

Automatic Detection of Machine Generated Text: A Critical Survey

Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection