Gotcha GPT: Ensuring the Integrity in Academic Writing

João Gabriel Gralha,André Silva Pimentel
DOI: https://doi.org/10.1021/acs.jcim.4c01203
2024-11-11
Abstract:This application note explores how to address a challenging problem faced by many academics and publishing professionals in recent years: ensuring the integrity of academic writing in universities and publishing houses due to advances in Artificial Intelligence (AI). It distinguishes AI- and human-generated English manuscripts using classifier models such as decision tree, random forest, extra trees, and AdaBoost. It utilizes Scikit learn libraries to provide statistics (precision, accuracy, recall, F1, MCC, and Cohen's kappa scores) and the confusion matrix to guarantee confidence to the user. The accuracy of the model evaluation for classification ranges from 0.97 to 0.99. There is a text data set of approximately 400 AI-generated texts and around 400 human-generated texts used for training and testing (50/50 random split). The AI texts were generated using detailed prompts that describe the text format of abstracts, introductions, discussions, and conclusions of scientific manuscripts in specific subjects. The tutorials for Gotcha GPT are written in Python by using the highly versatile Google Colaboratory platform. They are made freely available via GitHub (https://github.com/andresilvapimentel/Gotcha-GPT).
What problem does this paper attempt to address?