Evaluation of ChatGPT’s Usefulness and Accuracy in Diagnostic Surgical Pathology

Vincenzo Guastafierro,Devin Nicole Corbitt,Alessandra Bressan,Bethania Fernandes,Ömer Mintemur,Francesca Magnoli,Susanna Ronchi,Stefano La Rosa,Silvia Uccella,Salvatore Lorenzo Renne
DOI: https://doi.org/10.1101/2024.03.12.24304153
2024-03-13
Abstract:ChatGPT is an artificial intelligence capable of processing and generating human-like language. ChatGPT’s role within clinical patient care and medical education has been explored; however, assessment of its potential in supporting histopathological diagnosis is lacking. In this study, we assessed ChatGPT’s reliability in addressing pathology-related diagnostic questions across 10 subspecialties, as well as its ability to provide scientific references. We created five clinico-pathological scenarios for each subspecialty, posed to ChatGPT as open-ended or multiple-choice questions. Each question either asked for scientific references or not. Outputs were assessed by six pathologists according to: 1) usefulness in supporting the diagnosis and 2) absolute number of errors. All references were manually verified. We used directed acyclic graphs and structural causal models to determine the effect of each scenario type, field, question modality and pathologist evaluation. Overall, we yielded 894 evaluations. ChatGPT provided useful answers in 62.2% of cases. 32.1% of outputs contained no errors, while the remaining contained at least one error (maximum 18). ChatGPT provided 214 bibliographic references: 70.1% were correct, 12.1% were inaccurate and 17.8% did not correspond to a publication. Scenario variability had the greatest impact on ratings, followed by prompting strategy. Finally, latent knowledge across the fields showed minimal variation. In conclusion, ChatGPT provided useful responses in one-third of cases, but the number of errors and variability highlight that it is not yet adequate for everyday diagnostic practice and should be used with discretion as a support tool. The lack of thoroughness in providing references also suggests caution should be employed even when used as a self-learning tool. It is essential to recognize the irreplaceable role of human experts in synthesizing images, clinical data and experience for the intricate task of histopathological diagnosis.
Pathology
What problem does this paper attempt to address?
This paper assesses the practicality and accuracy of ChatGPT in diagnostic surgical pathology. The study created 10 subspecialty clinical pathology scenarios and asked ChatGPT open-ended or multiple-choice questions to test its reliability in supporting pathology-related diagnostic questions and its ability to provide scientific references. Six pathologists evaluated the answers provided by ChatGPT and scored them based on the level of support for the diagnosis and the number of errors. The results showed that ChatGPT provided useful answers in 62.2% of cases, but 32.1% of the answers had no errors, and the remaining answers contained at least one error (up to 18 errors). Among the 214 references provided by ChatGPT, 70.1% were correct, 12.1% were inaccurate, and 17.8% could not be matched to a publication. The study suggests that ChatGPT can be used as an adjunct tool in certain cases, but due to its error rate and variability, it is not suitable for routine diagnostic practice and should be used with caution. Additionally, when used as a self-learning tool, careful consideration should be given to the provided reference materials. The paper emphasizes the irreplaceable role of human experts in integrating images, clinical data, and experience.