On the Generalization of Training-based ChatGPT Detection Methods

Han Xu,Jie Ren,Pengfei He,Shenglai Zeng,Yingqian Cui,Amy Liu,Hui Liu,Jiliang Tang
2023-10-04
Abstract:ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for ChatGPT detection.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore and study the generalization ability of training-based ChatGPT detection methods under different distributions. Specifically: 1. **Background and Motivation**: - ChatGPT, as a popular language model, performs excellently in various natural language tasks. - There is an urgent need to develop methods to distinguish between text generated by ChatGPT and text written by humans to prevent misuse (such as generating fake news, plagiarism, etc.). 2. **Problems with Existing Methods**: - Most existing methods distinguish between ChatGPT-generated text and human text by training classification models. - These methods perform well on specific datasets but their performance significantly drops when faced with unseen tasks or topics. 3. **Research Objectives**: - Study the generalization behavior of these training-based methods under different distributions. - Consider various factors affecting generalization, including prompts, text length, topics, and language tasks. - Collect new datasets and conduct extensive experimental analysis. 4. **Main Findings**: - A new dataset HC-Var is proposed, containing various types of text (news, reviews, writing, Q&A). - Analyzed the impact of different prompts and text lengths on the model's generalization performance. - Found that models tend to overfit to some irrelevant features while ignoring the true detection features. - Provided theoretical analysis explaining why certain data collection strategies lead to poor model generalization performance. Through this research, the authors hope to provide guidance for the future development of better ChatGPT detection methods.