Is ChatGPT Better Than Epileptologists at Interpreting Seizure Semiology?

Meng Jiao,Yaxi Luo,Neel Fotedar,Jun-En Ding,Ioannis Karakis,Vikram R Rao,Melissa Asmar,Xiaochen Xian,Orwa Aboud,Yuxin Wen,Jack J Lin,Fang-Ming Hung,Hai Sun,Felix Rosenow,Feng Liu
DOI: https://doi.org/10.1101/2024.04.13.24305773
2024-09-13
Abstract:Objective: This study aims to evaluate the clinical value of representative large language models (LLMs), namely ChatGPT, on interpreting seizure semiology to localize epileptogenic zones (EZs) for presurgical assessment in patients with focal epilepsy. Method: We compiled two data cohorts through public sources and a private database respectively. The data cohort compiled from public sources consists of 852 semiology-EZ pairs derived from 193 peer-reviewed journal publications. The private database includes 184 semiology-EZ pairs collected from the Far Eastern Memorial Hospital (FEMH) in Taiwan. ChatGPT was asked to generate the most likely EZ locations based on the semiology records from both cohorts with two prompting methods: Zero-shot prompting (ZSP) and Few-shot prompting (FSP). To evaluate the ChatGPT's performance compared to epileptologists, a panel of eight epileptologists was recruited for an online survey to provide their interpretations on 100 randomly selected semiology records. The responses from ChatGPT and epileptologists were compared using three metrics: regional sensitivity (RSens), weighted sensitivity (WSens), and net positive inference rate (NPIR). Results: In the evaluation of interpreting seizure semiology, ChatGPT achieved over 80% sensitivity for the frontal and temporal lobes, approximately 40% for the occipital lobe, 20-30% for the parietal lobe, 20% for the insular cortex, and 0% for the cingulate cortex consistently in both data cohorts. By analyzing the responses from epileptologists, ChatGPT-4 outperformed epileptologists in localizing the frontal and temporal lobes, exhibited similar accuracy for the occipital and parietal lobes, but underperformed in the insular and cingulate cortices. Both ChatGPT and epileptologists demonstrated comparable values for WSens and the mean of NPIR. Significance: ChatGPT was shown as a clinically valuable tool to assist the decision-making in the epilepsy preoperative workup. With ongoing advancements in LLMs, it is anticipated that the reliability and accuracy of ChatGPT will continue to improve in the future.
Neurology
What problem does this paper attempt to address?
The paper aims to evaluate the clinical value of large language models (specifically ChatGPT) in interpreting seizure semiology to locate epileptogenic zones (EZs), thereby assisting in preoperative evaluation. The study achieves this goal by comparing the performance of ChatGPT with a group of certified epilepsy experts on two datasets. Specifically, the study used a public dataset (containing 852 cases) and a private dataset (containing 184 cases), and assessed ChatGPT's performance using three metrics: Regional Sensitivity (RSens), Weighted Sensitivity (WSens), and Net Positive Inference Rate (NPIR). Additionally, the study invited eight experienced epilepsy experts to participate in an online survey to further validate ChatGPT's performance. The results showed that ChatGPT outperformed or was comparable to expert level in localizing frontal and temporal lobe epilepsy, but its performance was relatively weaker in the parietal, occipital, insular, and cingulate regions.