Enhancing chatbot performance for imaging recommendations: Leveraging GPT-4 and context-awareness for trustworthy clinical guidance

Alexander Rau,Fabian Bamberg,Anna Fink,Phuong Hien Tran,Marco Reisert,Maximilian F Russe
DOI: https://doi.org/10.1016/j.ejrad.2024.111756
2024-09-24
Abstract:Purpose: To investigate if GPT-4 improves the accuracy, consistency, and trustworthiness of a context-aware chatbot to provide personalized imaging recommendations from American College of Radiology (ACR) appropriateness criteria documents using semantic similarity processing: In addition, we sought to enable auditability of the output by revealing the information source the decision relies on. Material and methods: We refined an existing chatbot that incorporated specialized knowledge of the ACR guidelines by upgrading GPT-3.5-Turbo to its successor GPT-4 by OpenAI, using the latest version of LlamaIndex, and improving the prompting strategy. This chatbot was compared to the previous version, generic GPT-3.5-Turbo and GPT-4, and general radiologists regarding the performance in applying the ACR appropriateness guidelines. Results: The refined context-aware chatbot performed superior to the previous version using GPT-3.5-Turbo, generic chatbots GPT-3.5-Turbo and GPT-4, and general radiologists in providing "usually or may be appropriate" recommendations according to the ACR guidelines (all p < 0.001). It also outperformed GPT-3.5-Turbo and general radiologists in respect to "usually appropriate" recommendations (both p < 0.001). Moreover, the consistency in correct answers was higher with 78 % consistent correct "usually appropriate" answers and 94 % for "usually or may be appropriate" recommendations. In all cases, the same source documents were chosen, ensuring transparency. Conclusion: Our study demonstrates the significance of context awareness in ensuring the use of appropriate knowledge and proposes a strategy to enhance trust in chatbot-based outputs to provide transparency. The improvements in accuracy, consistency, and source transparency address trust issues and enhance the clinical decision support process. Abbreviations: ACR, American College of Radiology; accGPT, appropriateness criteria context aware GPT; accGPT-4, appropriateness criteria context aware GPT using GPT-4; GPT, generative pre-trained transformer; LLM, Large Language Model.
What problem does this paper attempt to address?