Abstract:The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not reliable in practical scenarios. In particular, we develop a recursive paraphrasing attack to apply on AI text, which can break a whole range of detectors, including the ones using the watermarking schemes as well as neural network-based detectors, zero-shot classifiers, and retrieval-based detectors. Our experiments include passages around 300 tokens in length, showing the sensitivity of the detectors even in the case of relatively long passages. We also observe that our recursive paraphrasing only degrades text quality slightly, measured via human studies, and metrics such as perplexity scores and accuracy on text benchmarks. Additionally, we show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks aimed to mislead detectors to classify human-written text as AI-generated, potentially causing reputational damages to the developers. In particular, we show that an adversary can infer hidden AI text signatures of the LLM outputs without having white-box access to the detection method. Finally, we provide a theoretical connection between the AUROC of the best possible detector and the Total Variation distance between human and AI text distributions that can be used to study the fundamental hardness of the reliable detection problem for advanced language models. Our code is publicly available at <a class="link-external link-https" href="https://github.com/vinusankars/Reliability-of-AI-text-detectors" rel="external noopener nofollow">this https URL</a>.

UNCOVER: Identifying AI Generated News Articles by Linguistic Analysis and Visualization

Unmasking artificial intelligence (AI): Identifying articles written by AI models

Testing of Detection Tools for AI-Generated Text

StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis

Unveiling AI-Generated Financial Text: A Computational Approach Using Natural Language Processing and Generative Artificial Intelligence

A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Can AI-Generated Text be Reliably Detected?

Detecting AI Generated Text Based on NLP and Machine Learning Approaches

An Empirical Study of AI Generated Text Detection Tools

Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

AI "News" Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian

Staying vigilant in the Age of AI: From content generation to content authentication

Deceptive AI Explanations: Creation and Detection

UTILIZING NATURAL LANGUAGE PROCESSING TO CLASSIFY FAKE NEWS ARTICLES: IDENTIFYING IN-ARTICLE ATTRIBUTION AS A SUPERVISED LEARNING ESTIMATOR

Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool

AI Content Self-Detection for Transformer-based Large Language Models