UNCOVER: Identifying AI Generated News Articles by Linguistic Analysis and Visualization

Tim Cech,Jürgen Döllner,Willy Scheibel,Jannis Baum,Lucas Liebe,Tilman Schütze
DOI: https://doi.org/10.5220/0012163300003598
Abstract:: Text synthesis tools are becoming increasingly popular and better at mimicking human language. In trust-sensitive decisions, such as plagiarism and fraud detection, identifying AI-generated texts poses larger difficulties: decisions need to be made explainable to ensure trust and accountability. To support users in identifying AI-generated texts, we propose the tool UN C OVER . The tool analyses texts through three explainable linguistic approaches: Stylometric writing style analysis, topic modeling, and entity recognition. The result of the tool is a prediction and visualization of the analysis. We evaluate the tool on news articles by means of accuracy of the prediction and an expert study with 13 participants. The final prediction is based on classification of stylometric and evolving topic analysis. It achieved an accuracy of 70.4% and a weighted F1-score of 85.6%. The participants preferred to base their assessment on the prediction and the topic graph. In contrast, they found the entity recognition to be an ineffective indicator. Moreover, five participants highlighted the explainable aspects of UN C OVER and overall the participants achieved 69% accuracy. Eight participants expressed interest to continue using UN C OVER for identifying AI-generated texts.
Linguistics,Computer Science
What problem does this paper attempt to address?