Abstract:With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to improve the performance of artificial intelligence - generated text (AIGT) detection in a black - box environment. Specifically, although existing white - box methods are superior to black - box methods in terms of performance and generalization ability, they require access to the internal states of language models (LLMs), which is usually not feasible in commercial services. Therefore, the paper proposes an efficient resampling method guided by proxies (POGER), aiming to estimate word - generation probabilities as pseudo - white - box features to improve the AIGT detection effect in a black - box environment. By selecting a small number of representative words for multiple resampling, POGER can effectively detect AI - generated text without accessing the internal states of the model, and outperforms existing baseline methods in terms of the macro - F1 metric while maintaining a low resampling cost.
### Key points:
1. **Problem background**:
- With the wide application of large - language models (LLMs), the quality of AI - generated text has significantly improved, but it has also brought about social problems such as fake news, academic misconduct, and information pollution.
- Existing AIGT detection methods are divided into white - box and black - box methods. Among them, white - box methods have better performance but require access to the internal states of the model, while black - box methods have a wider range of applications but poorer performance.
2. **Solution**:
- Propose the POGER method, which estimates word - generation probabilities through multiple resampling as pseudo - white - box features for AIGT detection in a black - box environment.
- Select a small number of representative words for resampling to reduce sampling costs while retaining the unique features of the model.
3. **Experimental results**:
- Experiments show that POGER performs well in black - box, partially white - box, and out - of - distribution (OOD) settings, and the macro - F1 metric is significantly better than existing baseline methods.
- POGER not only performs excellently in multi - class classification tasks but also achieves the best performance in binary classification tasks.
### Formulas and technical details:
- **Standard Error (SE)**:
\[
SE(\hat{p_i})=\sqrt{\frac{p_i(1 - p_i)}{N}}
\]
where \(\hat{p_i}\) is the probability of word \(x_i\) estimated through \(N\) resampling, and \(p_i\) is the true probability.
- **Low - probability word selection**:
\[
SE(\hat{p_i})\leq\Delta\cdot p_i\Rightarrow p_i\geq\frac{1}{1 + N\Delta^2}
\]
Select low - probability words that meet the conditions by controlling the relative error.
- **Probability estimation**:
\[
\hat{p}(x_i|x_{<i})=\frac{1}{N}\sum_{j = 1}^{N}I(o_j=x_i)
\]
where \(I(\cdot)\) is an indicator function, indicating the frequency of word \(x_i\) in \(N\) resampling.
- **Context compensation**:
\[
F = \text{Att}(L', C, C)\oplus\text{Att}(C, L', L')
\]
where \(\text{Att}\) represents the attention mechanism, \(\oplus\) represents the concatenation operation, and \(L'\) and \(C\) are probability features and context features respectively.
Through these technical means, POGER effectively solves the performance and generalization problems of AIGT detection in a black - box environment, providing strong support for practical applications.