Abstract:The academic and scientific world in general is increasingly concerned about their inability to determine and ascertain the identity of the writer of a text. More and more often the question arises as to whether a scientific article or work handed in by a student was actually produced by the alleged author of the questioned text. The role of artificial intelligence (AI) is increasingly debated due to its dangers of undeclared use. A current example is undoubtedly the undeclared use of ChatGPT to write a scientific text. The article promotes an AI model-independent redundancy measure to support discrimination between hypotheses on authorship of various multilingual texts written by humans or produced by intelligence media such as ChatGPT. The syntax of texts written by humans tends to differ from that of texts produced by AIs. This difference can be grasped and quantified even with short texts (i.e. 1800 characters). This aspect of length is extremely important, because short texts imply a greater difficulty of analysis to characterize authorship. To meet the efficiency criteria required for the evaluation of forensic evidence, a probabilistic approach is implemented. In particular, to assess the value of the redundancy measure and to offer a consistent classification criterion, a metric called Bayes factor is implemented. The proposed Bayesian probabilistic method represents an original approach in stylometry. Analyses performed over multilingual texts (English and French) covering different scientific and human areas of interest (forensic science and socio-psycho-artistic topics) reveal the feasibility of a successful authorship discrimination with limited misclassification rates. Model performance is satisfactory even with small sample sizes.

Authorship Attribution Using the Chaos Game Representation

A Machine Learning Framework for Authorship Identification From Texts

On the "Calligraphy" of Books

Complexity And Ergodicity In Chaos Game Representation Of Genomic Sequences

Authorship attribution based on a probabilistic topic model

Neural Authorship Attribution: Stylometric Analysis on Large Language Models

Authorship recognition via fluctuation analysis of network topology and word intermittency

A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach

Authorship identification using ensemble learning

A Novel Numerical Representation for Proteins: Three-dimensional Chaos Game Representation and Its Extended Natural Vector

Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges

A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution

A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts

Splice Sites Detection Using Chaos Game Representation and Neural Network.

Three dimensional chaos game representation of protein sequences

Numerical Encoding of DNA Sequences by Chaos Game Representation with Application in Similarity Comparison.

Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution

Test on the structure of biological sequences via Chaos Game Representation

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Exploring Cooperative Game Mechanisms of Scientific Coauthorship Networks

Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models