Abstract:Background Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT’s accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. Objective The aim of this study was to evaluate ChatGPT’s ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. Methods Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. Results The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, –0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases “essential,” “recommended,” “best,” and “important” were used. Specifically, “essential” occurred in 4 out of 125, “recommended” in 12 out of 125, “best” in 6 out of 125, and “important” in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. Conclusions The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.

Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain

An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy

ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis

Contributions of Gaussian curvature and nonconstant lipid volume to protein deformation of lipid bilayers.

ChatGPT v4 outperforming v3.5 on cancer treatment recommendations in quality, clinical guideline, and expert opinion concordance

Art in Science: The Doctor by Luke Fildes: Putting the Patient First

Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery

ChatGPT in the development of medical questionnaires. The example of the low back pain

Will ChatGPT be Able to Replace a Spine Surgeon in the Clinical Setting?

Conformity of ChatGPT recommendations with the AUA/SUFU guideline on postprostatectomy urinary incontinence

The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain

Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study

ChatGPT vs. web search for patient questions: what does ChatGPT do better?

Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4

ChatGPT as a Source of Patient Information for Lumbar Spinal Fusion and Laminectomy

Biosynthesis and photodynamic efficacy of protoporphyrin IX (PpIX) generated by 5-aminolevulinic acid (ALA) or its hexylester (hALA) in rat bladder carcinoma cells.

Evaluating ChatGPT's Utility in Medicine Guidelines Through Web Search Analysis

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams

Shape-controlled TiO2 nanoparticles and TiO2 P25 interacting with CO and H2O2 molecular probes: a synergic approach for surface structure recognition and physico-chemical understanding.

The Synaptron as an Element in Pattern Recognition and Cerebellar Control Applications