Toward the Adoption of Explainable Pre-Trained Large Language Models for Classifying Human-Written and AI-Generated Sentences
Luca Petrillo,Fabio Martinelli,Antonella Santone,Francesco Mercaldo
DOI: https://doi.org/10.3390/electronics13204057
IF: 2.9
2024-10-16
Electronics
Abstract:Pre-trained large language models have demonstrated impressive text generation capabilities, including understanding, writing, and performing many tasks in natural language. Moreover, with time and improvements in training and text generation techniques, these models are proving efficient at generating increasingly human-like content. However, they can also be modified to generate persuasive, contextual content weaponized for malicious purposes, including disinformation and novel social engineering attacks. In this paper, we present a study on identifying human- and AI-generated content using different models. Precisely, we fine-tune different models belonging to the BERT family, an open-source version of the GPT model, ELECTRA, and XLNet, and then perform a text classification task using two different labeled datasets—the first one consisting of 25,000 sentences generated by both AI and humans and the second comprising 22,929 abstracts that are ChatGPT-generated and written by humans. Furthermore, we perform an additional phase where we submit 20 sentences generated by ChatGPT and 20 sentences randomly extracted from Wikipedia to our fine-tuned models to verify the efficiency and robustness of the latter. In order to understand the prediction of the models, we performed an explainability phase using two sentences: one generated by the AI and one written by a human. We leveraged the integrated gradients and token importance techniques, analyzing the words and subwords of the two sentences. As a result of the first experiment, we achieved an average accuracy of 99%, precision of 98%, recall of 99%, and F1-score of 99%. For the second experiment, we reached an average accuracy of 51%, precision of 50%, recall of 52%, and F1-score of 51%.
engineering, electrical & electronic,computer science, information systems,physics, applied