Abstract:Background and objective ChatGPT and Google Bard AI are widely used conversational chatbots, even in healthcare. While they have several strengths, they can generate seemingly correct but erroneous responses, warranting caution in medical contexts. In an era where access to abortion care is diminishing, patients may increasingly rely on online resources and AI-driven language models for information on medication abortions. In light of this, this study aimed to compare the accuracy and comprehensiveness of responses generated by ChatGPT 3.5 and Google Bard AI to medical queries about medication abortions. Methods Fourteen open-ended questions about medication abortion were formulated based on the Frequently Asked Questions (FAQs) from the National Abortion Federation (NAF) and the Reproductive Health Access Project (RHAP) websites. These questions were answered using ChatGPT version 3.5 and Google Bard AI on October 7, 2023. The accuracy of the responses was analyzed by cross-referencing the generated answers against the information provided by NAF and RHAP. Any discrepancies were further verified against the guidelines from the American Congress of Obstetricians and Gynecologists (ACOG). A rating scale used by Johnson et al. was employed for assessment, utilizing a 6-point Likert scale [ranging from 1 (completely incorrect) to 6 (correct)] to evaluate accuracy and a 3-point scale [ranging from 1 (incomplete) to 3 (comprehensive)] to assess completeness. Questions that did not yield answers were assigned a score of 0 and omitted from the correlation analysis. Data analysis and visualization were done using R Software version 4.3.1. Statistical significance was determined by employing Spearman's R and Mann-Whitney U tests. Results All questions were entered sequentially into both chatbots by the same author. On the initial attempt, ChatGPT successfully generated relevant responses for all questions, while Google Bard AI failed to provide answers for five questions. Repeating the same question in Google Bard AI yielded an answer for one; two were answered with different phrasing; and two remained unanswered despite rephrasing. ChatGPT showed a median accuracy score of 5 (mean: 5.26, SD: 0.73) and a median completeness score of 3 (mean: 2.57, SD: 0.51). It showed the highest accuracy score in six responses and the highest completeness score in eight responses. In contrast, Google Bard AI had a median accuracy score of 5 (mean: 4.5, SD: 2.03) and a median completeness score of 2 (mean: 2.14, SD: 1.03). It achieved the highest accuracy score in five responses and the highest completeness score in six responses. Spearman's correlation coefficient revealed no correlation between accuracy and completeness for ChatGPT (rs = -0.46771, p = 0.09171). However, Google Bard AI showed a marginally significant correlation (rs = 0.5738, p = 0.05108). Mann-Whitney U test indicated no statistically significant differences between ChatGPT and Google Bard AI concerning accuracy (U = 82, p>0.05) or completeness (U = 78, p>0.05). Conclusion While both chatbots showed similar levels of accuracy, minor errors were noted, pertaining to finer aspects that demand specialized knowledge of abortion care. This could explain the lack of a significant correlation between accuracy and completeness. Ultimately, AI-driven language models have the potential to provide information on medication abortions, but there is a need for continual refinement and oversight.

Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study

Accuracy of Online Artificial Intelligence Models in Primary Care Settings

Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia

Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases

A Comparative Analysis of ChatGPT and Google’s AI’s “Bard” in Medicine

Influence of synthesis temperature on the properties of Ga-doped ZnO nanorods grown by thermal evaporation.

Artificial Intelligence Chatbots' Understanding of the Risks and Benefits of Computed Tomography and Magnetic Resonance Imaging Scenarios

A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis

REVOLUTIONIZING IBD MANAGEMENT: HOW DO CHATGPT & GOOGLE BARD STAND UP IN OFFERING COMPREHENSIVE MANAGEMENT SOLUTIONS?

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI

[The metabolism of glycerophospholipid and its regulation in mammalian cells].

Soft Computing in Data Science

Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard against Traditional Information Resources

Embracing the future—is artificial intelligence already better? A comparative study of artificial intelligence performance in diagnostic accuracy and decision‐making

People over trust AI-generated medical responses and view them to be as valid as doctors, despite low accuracy

Preauthorization of CT and MRI examinations: assessment of a managed care preauthorization program based on the ACR Appropriateness Criteria and the Royal College of Radiology guidelines.

Exploring the role of AI-driven chatbots in patient care: a critical evaluation amidst healthcare staff shortages

Evaluation of an Artificial Intelligence Chatbot for Delivery of Interventional Radiology Patient Education Material: A Comparison with Societal Website Content.

The role of AI in optimizing drug dosage and reducing medication errors

A Clinical Evaluation of Cardiovascular Emergencies: A Comparison of Responses from ChatGPT, Emergency Physicians, and Cardiologists