ACCURACY OF ARTIFICIAL INTELLIGENCE CHATBOTS’ REPLIES IN GREEK VS. ENGLISH AND IN ACCORDANCE WITH THE 2023 ESH GUIDELINES FOR THE MANAGEMENT OF ARTERIAL HYPERTENSION

Antonios Ioannidis,Dimitrios Tsounis,Georgios Bouras,Despoina Komninou,Apostolos Pechlevanis,Eleni Zalokosta,Eirini Mylona,Christina Sidera,Efthymia Markidou,Theodora Kafkia
DOI: https://doi.org/10.1097/01.hjh.0001019360.20265.fe
IF: 4.9
2024-05-01
Journal of Hypertension
Abstract:Objective: The emergence of artificial intelligence (AI) chatbots has created new opportunities. This study aims to assess how well online AI chatbots, capable to interact in multiple languages, could respond in accordance with the 2023 ESH Guidelines. Design and method: We structured 20 questions, both in English and Greek, covering issues that were included in the 2023 ESH Guidelines recommendations. The questions were fed to four free online chatbots that can interrogate questions in multiple languages. The responses were recorded and evaluated by three experienced cardiologists with special interest in hypertension. To assess consistency, each question was asked three times, though only the first response was included in the accuracy analysis. All questions were preceded by ’According to the 2023 ESH Guidelines for the management of arterial hypertension’. A response was considered ’accurate’ if it included all essential information, ’inaccurate’ if it was not in accordance with the guidelines and ’incomplete’ if any essential information was missing. Results: In total there were 160 responses recorded (80 in Greek and 80 in English). A total of 62 (38.8%) responses were deemed accurate with significant difference between the languages (30% for Greek vs 47.5% for English responses), ranging from only 2 out of 20 (10% for YOU.COM in Greek) to 13 out of 20 (65% for BARD in English). Eighty-five (53.1%) of the responses were judged as inaccurate and 13 (8.1%) as incomplete. There were two questions that got no accurate responses from any chatbot (in either language). Moreover, 138 out of the 160 regenerated responses were consistent with the initial answer (86.3%). No chatbot would have replied accurately to every question even if the regenerated responses were to be considered. Conclusions: The study resulted in a variation of accuracy of the responses generated by four popular AI chatbots when asked about issues covered in the 2023 ESH Guidelines. The observed accuracy is lower for Greek. While the use of chat-based AI in medicine is still in its early stages and current models are not intended for medical use, the potential for such technology is significant.
peripheral vascular disease
What problem does this paper attempt to address?