Evaluation of ChatGPT as a counselling tool for Italian-speaking MASLD patients: assessment of accuracy, completeness and comprehensiveness

N. Pugliese,D. Polverini,R. Lombardi,G. Pennisi,F. Ravaioli,A. Armandi,E. Buzzetti,A. Dalbeni,A. Liguori,A. Mantovani,R. Villani,L. Valenti,L. Miele,S. Petta,G. Sebastiani,C. Hassan,A. Aghemo
DOI: https://doi.org/10.1016/j.dld.2024.01.091
IF: 5.165
2024-02-01
Digestive and Liver Disease
Abstract:Metabolic dysfunction-associated steatotic liver disease (MASLD) is a significant global public health concern and is expected to become the leading indication for liver transplantation in the coming decades. Chatbots, which utilize artificial intelligence (AI) to simulate conversations with users, could provide counselling and support to English-speaking patients with MASLD. In a recent study, we showed that while ChatGPT 3.5 is complete and comprehensive in answering MASLD-related questions, its accuracy is still suboptimal. Whether language plays a role in modifying these findings is unclear. We evaluated the accuracy, completeness and comprehensiveness of ChatGPT 3.5 in answering 15 pre-set questions about MASLD in Italian. The questions were grouped into three domains: specialist referral, physical activity, and dietary composition. ChatGPT responses were rated on a 6-point accuracy scale, a 3-point completeness scale, and a 3-point comprehensibility scale by 13 native Italian MASLD experts. The mean scores for accuracy and completeness were 4.57 ± 0.42 and 2.53 ± 0.51 respectively, with a mean score of 2.91 ± 0.07 for comprehensiveness. The physical activity domain received the highest mean score, with 4.82 ± 0.22 and 2.35 ± 0.11 for accuracy and completeness respectively. The mean Kendall's coefficient of concordance for accuracy, completeness and comprehensiveness across all 15 questions was 0.524, 0.623 and 0.73 respectively. Age and academic role of the evaluators did not affect the scores. The scores were not significantly different from those reported in our previous study focusing on the English language. In conclusion we have shown that language does not affect the ability of ChatGPT to provide complete and understandable counselling for MASLD patients, however its accuracy remains suboptimal in certain domains. To ensure the trustworthiness of medical information provided by AI, the collaboration between healthcare professionals, patient associations and medical literature databases needs to be further improved.
gastroenterology & hepatology
What problem does this paper attempt to address?