Detecting AI-generated essays: the ChatGPT challenge

Ilker Cingillioglu
DOI: https://doi.org/10.1108/ijilt-03-2023-0043
2023-05-01
Abstract:Purpose With the advent of ChatGPT, a sophisticated generative artificial intelligence (AI) tool, maintaining academic integrity in all educational settings has recently become a challenge for educators. This paper discusses a method and necessary strategies to confront this challenge. Design/methodology/approach In this study, a language model was defined to achieve high accuracy in distinguishing ChatGPT-generated essays from human written essays with a particular focus on “not falsely” classifying genuinely human-written essays as AI-generated (Negative). Findings Via support vector machine (SVM) algorithm 100% accuracy was recorded for identifying human generated essays. The author discussed the key use of Recall and F2 score for measuring classification performance and the importance of eliminating False Negatives and making sure that no actual human generated essays are incorrectly classified as AI generated. The results of the proposed model's classification algorithms were compared to those of AI-generated text detection software developed by OpenAI, GPTZero and Copyleaks. Practical implications AI-generated essays submitted by students can be detected by teachers and educational designers using the proposed language model and machine learning (ML) classifier at a high accuracy. Human (student)-generated essays can and must be correctly identified with 100% accuracy even if the overall classification accuracy performance is slightly reduced. Originality/value This is the first and only study that used an n-gram bag-of-words (BOWs) discrepancy language model as input for a classifier to make such prediction and compared the classification results of other AI-generated text detection software in an empirical way.
What problem does this paper attempt to address?