Have AI-Generated Texts from LLM Infiltrated the Realm of Scientific Writing? A Large-Scale Analysis of Preprint Platforms

Huzi Cheng,Bin Sheng,Aaron Lee,Varun Chaudary,Atanas G. Atanasov,Nan Liu,Yue Qiu,Tien Yin Wong,Yih-Chung Tham,Yingfeng Zheng
DOI: https://doi.org/10.1101/2024.03.25.586710
2024-03-30
Abstract:Since the release of ChatGPT in 2022, AI-generated texts have inevitably permeated various types of writing, sparking debates about the quality and quantity of content produced by such large language models (LLM). This study investigates a critical question: Have AI-generated texts from LLM infiltrated the realm of scientific writing, and if so, to what extent and in what setting? By analyzing a dataset comprised of preprint manuscripts uploaded to arXiv, bioRxiv, and medRxiv over the past two years, we confirmed and quantified the widespread influence of AI-generated texts in scientific publications using the latest LLM-text detection technique, the Binoculars LLM-detector. Further analyses with this tool reveal that: (1) the AI influence correlates with the trend of ChatGPT web searches; (2) it is widespread across many scientific domains but exhibits distinct impacts within them (highest: computer science, engineering sciences); (3) the influence varies with authors who have different language speaking backgrounds and geographic regions according to the location of their affiliations (Italy, China, etc.); (4) AI-generated texts are used in various content types in manuscripts (most significant: hypothesis formulation, conclusion summarization); (5) AI usage has a positive influence on paper’s impact, measured by its citation numbers. Based on these findings, suggestions about the advantages and regulation of AI-augmented scientific writing are discussed.
Scientific Communication and Education
What problem does this paper attempt to address?