Assessment of Pathology Domain-Specific Knowledge of ChatGPT and Comparison to Human Performance
Andrew Y Wang,Sherman Lin,Christopher Tran,Robert J Homer,Dan Wilsdon,Joanna C Walsh,Emily A Goebel,Irene Sansano,Snehal Sonawane,Vincent Cockenpot,Sanjay Mukhopadhyay,Toros Taskin,Nusrat Zahra,Luca Cima,Orhan Semerci,Birsen Gizem Özamrak,Pallavi Mishra,Naga Sarika Vennavalli,Po-Hsuan Cameron Chen,Matthew J Cecchini,Andrew Y. Wang,Robert J. Homer,Joanna C. Walsh,Emily A. Goebel,Matthew J. Cecchini
DOI: https://doi.org/10.5858/arpa.2023-0296-oa
2024-01-20
Archives of Pathology & Laboratory Medicine
Abstract:Context.— Artificial intelligence algorithms hold the potential to fundamentally change many aspects of society. Application of these tools, including the publicly available ChatGPT, has demonstrated impressive domain-specific knowledge in many areas, including medicine. Objectives.— To understand the level of pathology domain-specific knowledge for ChatGPT using different underlying large language models, GPT-3.5 and the updated GPT-4. Design.— An international group of pathologists (n = 15) was recruited to generate pathology-specific questions at a similar level to those that could be seen on licensing (board) examinations. The questions (n = 15) were answered by GPT-3.5, GPT-4, and a staff pathologist that recently passed their Canadian pathology licensing exams. Participants were instructed to score answers on a 5-point scale and to predict which answer was written by ChatGPT. Results.— GPT-3.5 performed at a similar level to the staff pathologist, while GPT-4 outperformed both. The overall score for both GPT-3.5 and GPT-4 was within the range of meeting expectations for a trainee writing licensing examinations. In all but one question, the reviewers were able to correctly identify the answers generated by GPT-3.5. Conclusions.— By demonstrating the ability of ChatGPT to answer pathology-specific questions at a level similar to (GPT-3.5) or exceeding (GPT-4) a trained pathologist, this study highlights the potential of large language models to be transformative in this space. In the future, more advanced iterations of these algorithms with increased domain-specific knowledge may have the potential to assist pathologists and enhance pathology resident training.
pathology,medical laboratory technology,medicine, research & experimental