Abstract:With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to improve the performance of artificial intelligence - generated text (AIGT) detection in a black - box environment. Specifically, although existing white - box methods are superior to black - box methods in terms of performance and generalization ability, they require access to the internal states of language models (LLMs), which is usually not feasible in commercial services. Therefore, the paper proposes an efficient resampling method guided by proxies (POGER), aiming to estimate word - generation probabilities as pseudo - white - box features to improve the AIGT detection effect in a black - box environment. By selecting a small number of representative words for multiple resampling, POGER can effectively detect AI - generated text without accessing the internal states of the model, and outperforms existing baseline methods in terms of the macro - F1 metric while maintaining a low resampling cost. ### Key points: 1. **Problem background**: - With the wide application of large - language models (LLMs), the quality of AI - generated text has significantly improved, but it has also brought about social problems such as fake news, academic misconduct, and information pollution. - Existing AIGT detection methods are divided into white - box and black - box methods. Among them, white - box methods have better performance but require access to the internal states of the model, while black - box methods have a wider range of applications but poorer performance. 2. **Solution**: - Propose the POGER method, which estimates word - generation probabilities through multiple resampling as pseudo - white - box features for AIGT detection in a black - box environment. - Select a small number of representative words for resampling to reduce sampling costs while retaining the unique features of the model. 3. **Experimental results**: - Experiments show that POGER performs well in black - box, partially white - box, and out - of - distribution (OOD) settings, and the macro - F1 metric is significantly better than existing baseline methods. - POGER not only performs excellently in multi - class classification tasks but also achieves the best performance in binary classification tasks. ### Formulas and technical details: - **Standard Error (SE)**: \[ SE(\hat{p_i})=\sqrt{\frac{p_i(1 - p_i)}{N}} \] where \(\hat{p_i}\) is the probability of word \(x_i\) estimated through \(N\) resampling, and \(p_i\) is the true probability. - **Low - probability word selection**: \[ SE(\hat{p_i})\leq\Delta\cdot p_i\Rightarrow p_i\geq\frac{1}{1 + N\Delta^2} \] Select low - probability words that meet the conditions by controlling the relative error. - **Probability estimation**: \[ \hat{p}(x_i|x_{<i})=\frac{1}{N}\sum_{j = 1}^{N}I(o_j=x_i) \] where \(I(\cdot)\) is an indicator function, indicating the frequency of word \(x_i\) in \(N\) resampling. - **Context compensation**: \[ F = \text{Att}(L', C, C)\oplus\text{Att}(C, L', L') \] where \(\text{Att}\) represents the attention mechanism, \(\oplus\) represents the concatenation operation, and \(L'\) and \(C\) are probability features and context features respectively. Through these technical means, POGER effectively solves the performance and generalization problems of AIGT detection in a black - box environment, providing strong support for practical applications.

Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling

Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection Via Querying ChatGPT.

Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

MAGE: Machine-generated Text Detection in the Wild

AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising

Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors

DetectGPT-SC: Improving Detection of Text Generated by Large Language Models through Self-Consistency with Masked Predictions

Exploring AI Text Generation, Retrieval-Augmented Generation, and Detection Technologies: a Comprehensive Overview

Improving Logits-based Detector without Logits from Black-box LLMs

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios

SeqXGPT: Sentence-Level AI-Generated Text Detection

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

'Quis custodiet ipsos custodes?' Who will watch the watchmen? On Detecting AI-generated peer-reviews

Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning

PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection