The Clinical Utility of Large Language Models in Diagnosing Neurocognitive Disorders among NACC Participants

Hanan S Rafiuddin,Charlie Su,Allan A Dimmick,David Cicero
DOI: https://doi.org/10.1093/arclin/acae067.014
2024-09-12
Archives of Clinical Neuropsychology
Abstract:Abstract Objective Our study aimed to investigate ChatGPT’s ability to diagnose neurocognitive disorders in older adults based on neuropsychological interviews and test results and compare its accuracy with neuropsychologist consensus diagnosis. Methods We prompted ChatGPT-4 to provide provisional diagnoses (normal changes, mild cognitive impairment, or dementia) for each case. We also asked ChatGPT to diagnose the underlying etiology. We evaluated interrater reliability (IRR) with Cohen’s kappa and a chi-square test. Receiver operating characteristic (ROC) curves and area-under-the-curve (AUC) values were computed to explore measures predicting diagnostic agreement. Results Participants were 15,048 older adults (Mage = 69.13, 56.7% female) assessed in Alzheimer’s Disease Core Centers across the United States. For cognitive diagnoses, Cohen’s kappa demonstrated fair agreement (κ = 0.42; p < 0.001). The chi-square test [χ2 (6) = 7896.51, p < 0.001] revealed a significant and strong association (Cramer’s V = 0.51, p < 0.001) between ChatGPT and clinicians’ diagnoses. For the underlying etiology, Cohen’s kappa demonstrated fair agreement on specific diagnoses (κ = 0.38; p < 0.001). The chi-square test [χ2 (24) = 5353.04, p < 0.001] revealed a significant and strong association (Cramer’s V = 0.30, p < 0.001) between ChatGPT and clinicians’ diagnoses. ROC curves demonstrate that no variables predicted diagnostic agreement at above chance. Conclusion Our findings demonstrate the potential clinical utility of integrating LLMs into neuropsychology. While significant IRR was found between ChatGPT and the neuropsychologist diagnosis, further research is needed prior to the adoption of LLMs into clinical practice, given ongoing developments of artificial intelligence in the field of medicine.
psychology, clinical
What problem does this paper attempt to address?