Abstract:Background: Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5's role is examined in managing CHB, particularly in regions with distinct health care landscapes. Objective: This study aimed to uncover insights into ChatGPT-3.5's potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts. Methods: Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0. Results: Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P<.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5's accuracy rate of 65.0% (117/180) (P<.001). Conclusions: In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered when selecting large language models as medical consultation assistants. Given that both models performed inadequately in emotional guidance management, this study highlights the importance of providing specific language training and emotional management strategies when deploying ChatGPT for medical purposes. Furthermore, the tendency of these models to use disclaimers in conversations should be further investigated to understand the impact on patients' experiences in practical applications.

Evaluation of the quality and readability of ChatGPT responses to frequently asked questions about myopia in traditional Chinese language

ChatGPT: is it good for our glaucoma patients?

ChatGPT for Addressing Patient-centered Frequently Asked Questions in Glaucoma Clinical Practice

Assessing the utility of ChatGPT as an artificial intelligence‐based large language model for information to answer questions on myopia

Uncovering Language Disparity of ChatGPT in Healthcare: Non-English Clinical Environment for Retinal Vascular Disease Classification

Uncovering Language Disparity of ChatGPT in Healthcare: Non-English Clinical Environment for Retinal Vascular Disease Classification (Preprint)

ChatGPT and retinal disease: a cross-sectional study on AI comprehension of clinical guidelines

ChatGPT and Google Assistant as a Source of Patient Education for Patients With Amblyopia: Content Analysis

Comparing the Ability of Google and ChatGPT to Accurately Respond to Oculoplastics-Related Patient Questions and Generate Customized Oculoplastics Patient Education Materials

Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control

Exploring the Accuracy and Readability of ChatGPT in Providing Information to Patients With Keratoconus

The Performance of Chatbots and the AAPOS Website as a Tool for Amblyopia Education

Screening/diagnosis of pediatric endocrine disorders through the artificial intelligence model in different language settings

Evaluating Chatbot responses to patient questions in the field of glaucoma

Performance of Popular Large Language Models in Glaucoma Patient Education: a Randomized Controlled Study

Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases

Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese

Evaluating the application of ChatGPT in China's residency training education: An exploratory study

Evaluation of the Appropriateness and Readability of ChatGPT-4 Responses to Patient Queries on Uveitis

Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis

Evaluating accuracy and reproducibility of ChatGPT responses to patient-based questions in Ophthalmology: An observational study