Development and initial evaluation of a conversational agent for Alzheimer′s disease

Natalia Castano-Villegas,Isabella Llano,Maria Camila Villa,Julian Martinez,Jose Zea,Tatiana Urrea,Alejandra Maria Banol,Carlos Bohorquez,Nelson Martinez
DOI: https://doi.org/10.1101/2024.09.04.24312955
2024-09-06
Abstract:Background: Conversational Agents have attracted attention for personal and professional use. Their specialisation in the medical field is being explored. Conversational Agents (CA) have accomplished passing-level performance in medical school examinations and shown empathy when responding to patient questions. Alzheimer′s disease is characterized by the progression of cognitive and somatic decline. As the leading cause of dementia in the elderly, it is the subject of continuous investigations, which result in a constant stream of new information. Physicians are expected to keep up with the latest clinical guidelines; however, they aren′t always able to do so due to the large amount of information and their busy schedules. Objective: We designed a conversational agent intended for general physicians as a tool for their everyday practice to offer validated responses to clinical queries associated with Alzheimer′s Disease based on the best available evidence. Methodology: The conversational agent uses GPT-4o and has been instructed to respond based on 17 updated national and international clinical practice guidelines about Dementia and Alzheimer′s Disease. To approach the CA′s performance and accuracy, it was tested using three validated knowledge scales. In terms of evaluating the content of each of the assistant′s answers, a human evaluation was conducted in which 7 people evaluated the clinical understanding, retrieval, clinical reasoning, completeness, and usefulness of the CA′s output. Results: The agent obtained near-perfect performance in all three scales. It achieved a sensitivity of 100% for all three scales and a specificity of 75% in the less specific model. However, when modifying the input given to the assistant (prompting), specificity reached 100%, with a Cohen′s kappa of 1 in all tests. The human evaluation determined that the CA′s output showed comprehension of the clinical question and completeness in its answers. However, reference retrieval and perceived helpfulness of the CA reply was not optimal. Conclusions: This study demonstrates the potential of the agent and of specialised LLMs in the medical field as a tool for up-to-date clinical information, particularly when medical knowledge is becoming increasingly vast and ever-changing. Validations with health care experts and actual clinical use of the assistant by its target audience is
What problem does this paper attempt to address?