Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study

Swaminathan Ramasubramanian,Sangeetha Balaji,Tejashri Kannan,Naveen Jeyaraman,Shilpa Sharma,Filippo Migliorini,Suhasini Balasubramaniam,Madhan Jeyaraman
DOI: https://doi.org/10.5662/wjm.v14.i4.92802
2024-12-20
World Journal of Methodology
Abstract:BACKGROUND Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated. AIM To evaluate the accuracy of AI systems (ChatGPT 3.5, ChatGPT 4, Google Bard) in providing drug dosage information per Harrison's Principles of Internal Medicine. METHODS A set of natural language queries mimicking real-world medical dosage inquiries was presented to the AI systems. Responses were analyzed using a 3-point Likert scale. The analysis, conducted with Python and its libraries, focused on basic statistics, overall system accuracy, and disease-specific and organ system accuracies. RESULTS ChatGPT 4 outperformed the other systems, showing the highest rate of correct responses (83.77%) and the best overall weighted accuracy (0.6775). Disease-specific accuracy varied notably across systems, with some diseases being accurately recognized, while others demonstrated significant discrepancies. Organ system accuracy also showed variable results, underscoring system-specific strengths and weaknesses. CONCLUSION ChatGPT 4 demonstrates superior reliability in medical dosage information, yet variations across diseases emphasize the need for ongoing improvements. These results highlight AI's potential in aiding healthcare professionals, urging continuous development for dependable accuracy in critical medical situations.
What problem does this paper attempt to address?