Abstract:Image manipulation detection and localization have received considerable attention from the research community given the blooming of Generative Models (GMs). Detection methods that follow a passive approach may overfit to specific GMs, limiting their application in real-world scenarios, due to the growing diversity of generative models. Recently, approaches based on a proactive framework have shown the possibility of dealing with this limitation. However, these methods suffer from two main limitations, which raises concerns about potential vulnerabilities: i) the manipulation detector is not robust to noise and hence can be easily fooled; ii) the fact that they rely on fixed perturbations for image protection offers a predictable exploit for malicious attackers, enabling them to reverse-engineer and evade detection. To overcome this issue we propose PADL, a new solution able to generate image-specific perturbations using a symmetric scheme of encoding and decoding based on cross-attention, which drastically reduces the possibility of reverse engineering, even when evaluated with adaptive attack [31]. Additionally, PADL is able to pinpoint manipulated areas, facilitating the identification of specific regions that have undergone alterations, and has more generalization power than prior art on held-out generative models. Indeed, although being trained only on an attribute manipulation GAN model [15], our method generalizes to a range of unseen models with diverse architectural designs, such as StarGANv2, BlendGAN, DiffAE, StableDiffusion and StableDiffusionXL. Additionally, we introduce a novel evaluation protocol, which offers a fair evaluation of localisation performance in function of detection accuracy and better captures real-world scenarios.

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature

ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)

Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios

Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks

G3Detector: General GPT-Generated Text Detector

Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors

DetectGPT-SC: Improving Detection of Text Generated by Large Language Models through Self-Consistency with Masked Predictions

PTP: Boosting Stability and Performance of Prompt Tuning with Perturbation-Based Regularizer

PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense

AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising

Enhancing Machine-Generated Text Detection: Adversarial Fine-Tuning of Pre-Trained Language Models

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis

GPT understands, too

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection Via Querying ChatGPT.

Learning perturbation sets for robust machine learning