Evident gap between generative artificial intelligence as an academic editor compared to human editors in scientific publishing

Malik Sallam,Kholoud Al-Mahzoum,Omar Marzoaq,Mohammad Alfadhel,Amer Al-Ajmi,Mansour Al-Ajmi,Mohammad Al-Hajeri,Muna Barakat
DOI: https://doi.org/10.55214/25768484.v8i6.2189
2024-10-08
Edelweiss Applied Science and Technology
Abstract:The labyrinthine process of manuscript evaluation in scientific publishing often delays disseminating timely research results. Generative Artificial Intelligence (genAI) models could potentially enhance efficiency in academic publishing. However, it is crucial to scrutinize the reliability of genAI in simulating human editorial decisions. This study analyzed 34 manuscripts authored by the corresponding author, involving initial editorial decisions from six publishers across 28 journals. Two genAI models, ChatGPT-4o and Microsoft Copilot, assessed these manuscripts using tailored prompts. The correlation between genAI and actual human editorial decisions was evaluated using Kendall’s τb. The original decision-making speed and the quality of genAI outputs evaluated by the CLEAR tool were recorded. Editorial decision-making by genAI models was instantaneous, compared to the editors’ average of 21.6±31.1 days. Both models achieved high scores on the CLEAR tool, averaging 4.8±0.4 for ChatGPT-4o and 4.8±0.5 for Copilot. Despite these efficiencies, there was no significant correlation between the genAI and human decisions (τb=0.121, P=.487 for ChatGPT-4o; τb=0.197, P=.258 for Copilot), nor between the decisions of the two genAI models (τb=0.318, P=.068). This preliminary study indicated that genAI models can expedite the editorial process with high-quality outputs. However, genAI has not yet achieved the accuracy of human editors in decision-making.
What problem does this paper attempt to address?