Evaluation of ChatGPT’s Usefulness and Accuracy in Diagnostic Surgical Pathology

Vincenzo Guastafierro,Devin Nicole Corbitt,Alessandra Bressan,Bethania Fernandes,Ömer Mintemur,Francesca Magnoli,Susanna Ronchi,Stefano La Rosa,Silvia Uccella,Salvatore Lorenzo Renne

DOI: https://doi.org/10.1101/2024.03.12.24304153

2024-03-13

Abstract:ChatGPT is an artificial intelligence capable of processing and generating human-like language. ChatGPT’s role within clinical patient care and medical education has been explored; however, assessment of its potential in supporting histopathological diagnosis is lacking. In this study, we assessed ChatGPT’s reliability in addressing pathology-related diagnostic questions across 10 subspecialties, as well as its ability to provide scientific references. We created five clinico-pathological scenarios for each subspecialty, posed to ChatGPT as open-ended or multiple-choice questions. Each question either asked for scientific references or not. Outputs were assessed by six pathologists according to: 1) usefulness in supporting the diagnosis and 2) absolute number of errors. All references were manually verified. We used directed acyclic graphs and structural causal models to determine the effect of each scenario type, field, question modality and pathologist evaluation. Overall, we yielded 894 evaluations. ChatGPT provided useful answers in 62.2% of cases. 32.1% of outputs contained no errors, while the remaining contained at least one error (maximum 18). ChatGPT provided 214 bibliographic references: 70.1% were correct, 12.1% were inaccurate and 17.8% did not correspond to a publication. Scenario variability had the greatest impact on ratings, followed by prompting strategy. Finally, latent knowledge across the fields showed minimal variation. In conclusion, ChatGPT provided useful responses in one-third of cases, but the number of errors and variability highlight that it is not yet adequate for everyday diagnostic practice and should be used with discretion as a support tool. The lack of thoroughness in providing references also suggests caution should be employed even when used as a self-learning tool. It is essential to recognize the irreplaceable role of human experts in synthesizing images, clinical data and experience for the intricate task of histopathological diagnosis.

Pathology

What problem does this paper attempt to address?

This paper assesses the practicality and accuracy of ChatGPT in diagnostic surgical pathology. The study created 10 subspecialty clinical pathology scenarios and asked ChatGPT open-ended or multiple-choice questions to test its reliability in supporting pathology-related diagnostic questions and its ability to provide scientific references. Six pathologists evaluated the answers provided by ChatGPT and scored them based on the level of support for the diagnosis and the number of errors. The results showed that ChatGPT provided useful answers in 62.2% of cases, but 32.1% of the answers had no errors, and the remaining answers contained at least one error (up to 18 errors). Among the 214 references provided by ChatGPT, 70.1% were correct, 12.1% were inaccurate, and 17.8% could not be matched to a publication. The study suggests that ChatGPT can be used as an adjunct tool in certain cases, but due to its error rate and variability, it is not suitable for routine diagnostic practice and should be used with caution. Additionally, when used as a self-learning tool, careful consideration should be given to the provided reference materials. The paper emphasizes the irreplaceable role of human experts in integrating images, clinical data, and experience.

Evaluation of ChatGPT’s Usefulness and Accuracy in Diagnostic Surgical Pathology

Unveiling the risks of ChatGPT in diagnostic surgical pathology

Unveiling the risks of ChatGPT in diagnostic surgical pathologyChatGPT

Evaluation of ChatGPT pathology knowledge using board-style questions

Application of ChatGPT in Routine Diagnostic Pathology: Promises, Pitfalls, and Potential Future Directions

Assessment of Pathology Domain-Specific Knowledge of ChatGPT and Comparison to Human Performance

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians

Navigating the path to precision: ChatGPT as a tool in pathology

Comparative analysis of ChatGPT and Bard in answering pathology examination questions requiring image interpretation

Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists

Assessing ChatGPT's potential as a clinical resource for medical oncologists: An evaluation with board-style questions and real-world patient cases.

Possible benefits, challenges, pitfalls, and future perspective of using ChatGPT in pathology

Evaluating Chatgpt As an Adjunct for Analyzing Challenging Case

Accuracy of Information and References Using ChatGPT-3 for Retrieval of Clinical Radiological Information

ChatGPT in Clinical Medicine, Urology and Academia: A Review

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions

ChatGPT as a teaching tool: Preparing pathology residents for board examination with AI-generated digestive system pathology tests

Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model

The utility of ChatGPT in subspecialty consultation for patients (pts) with metastatic genitourinary (GU) cancer.

Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology

Performance of ChatGPT in Diagnosis of Corneal Eye Diseases