Evaluating the Potential of Large Language Models for Vestibular Rehabilitation Education: A Comparison of ChatGPT, Google Gemini, and Clinicians

Yael Arbel,Yoav Gimmon,Liora Shmueli
DOI: https://doi.org/10.1101/2024.01.24.24301737
2024-05-19
Abstract:Objective: To evaluate the accuracy, completeness, and explanations provided by ChatGPT in response to multiple-choice questions related to vestibular rehabilitation. Study Design: The study was conducted among 30 physical therapists professionals experienced with vestibular rehabilitation and 30 physical therapy students. They were asked to complete a Vestibular Knowledge Test consisting of 20 multiple-choice questions categorized into three groups: (1) Clinical Knowledge, (2) Basic Clinical Practice, and (3) Clinical Reasoning. Additionally, in May 2023, ChatGPT was tasked with answering the same 20 VKT questions and providing rationales for its answers. Three expert board-certified otoneurologists evaluated independently the accuracy of each ChatGPT response on a 4-level scale. Results: ChatGPT correctly answered 14 of the 20 multiple-choice questions (70%). It excelled in Clinical Knowledge (100%) but struggled in Clinical Reasoning (50%). According to three otoneurologic experts, ChatGPT's accuracy was "comprehensive" for 9 of the 20 questions (45%), while 5 (25%) were "completely incorrect". ChatGPT provided "comprehensive" responses in 50% of Clinical Knowledge and Basic Clinical Practice questions, but only 25% in Clinical Reasoning. Conclusion: Caution is advised when using the current version of ChatGPT due to its limited accuracy in clinical reasoning. While it provides accurate responses concerning Clinical Knowledge, its reliance on web information may lead to inconsistencies. Healthcare professionals should carefully formulate questions and be aware of the potential influence of the online prevalence of information on ChatGPT's responses. Combining clinical expertise and guidelines with ChatGPT can maximize benefits while mitigating limitations.
What problem does this paper attempt to address?