AI discernment in foot and ankle surgery research: A survey investigation

Steven R Cooperman,Abisola Olaniyan,Roberto A Brandão
DOI: https://doi.org/10.1016/j.fas.2024.10.001
2024-10-09
Abstract:Background: This study evaluated the ability to differentiate between AI-generated and human-authored abstracts in foot and ankle surgery. Methods: An AI system (ChatGPT 3.0) was trained on 21 published abstracts to create six novel case abstracts. Nine foot and ankle surgeons participated in a blinded survey, tasked with distinguishing AI-generated from human-written abstracts, rating their confidence in their responses. Surveys were completed twice at two different time points to evaluate intra-/inter-observer reliability. Results: The overall accuracy rate for distinguishing AI-generated from human-written abstracts was 50.5 % (n = 109/216), indicating no better performance than random chance. Reviewer experience and AI familiarity did not significantly affect accuracy. Inter-rater reliability was moderate initially but decreased over time, and intra-rater reliability was poor. Conclusions: In their current form, AI-generated abstracts are nearly indistinguishable from human-written ones, posing challenges for consistent identification in foot and ankle surgery. Level of evidence: IV.
What problem does this paper attempt to address?