Histochemical studies on idiopathic medionecrosis of the aorta.

J. Raekallio

1958-12-01

Abstract:

What problem does this paper attempt to address?

An evaluation of orthodontic information quality regarding artificial intelligence (AI) chatbot technologies: A comparison of ChatGPT and google BARD

Can Arslan,Kaan Kahya,Emre Cesur,Derya Germec Cakan

DOI: https://doi.org/10.2478/aoj-2024-0012

2024-01-01

Australasian Orthodontic Journal

Abstract:Abstract Introduction In recent times, chatbots have played an increasing and noteworthy role in the field of medical practice. The present research was conducted to evaluate the accuracy of the responses provided by ChatGPT and BARD, two of the most utilised chatbot programs, when interrogated regarding orthodontics. Materials and methods Twenty-four popular questions about conventional braces, clear aligners, orthognathic surgery, and orthodontic retainers were chosen for the study. When submitted to the ChatGPT and Google BARD platforms, an experienced orthodontist and an orthodontic resident rated the responses to the questions using a five-point Likert scale, with five indicating evidence-based information, four indicating adequate information, three indicating insufficient information, two indicating incorrect information, and one indicating no response. The results were recorded in Microsoft Excel for comparison and analysis. Results No correlation was found between the ChatGPT and Google BARD scores and word counts. However, a moderate to significant relationship was observed between the scores and several listed references. No significant association was found between the number of words and references, and a statistically significant difference was observed in both investigators’ numerical rating scales using the AI tools ( p = 0.014 and p = 0.030, respectively). Conclusion Generally, ChatGPT and BARD provide satisfactory responses to common orthodontic inquiries that patients might ask. ChatGPT’s answers marginally surpassed those of Google BARD in quality.

dentistry, oral surgery & medicine
The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard

Baraa Daraqel,Khaled Wafaie,Hisham Mohammed,Li Cao,Samer Mheissen,Yang Liu,Leilei Zheng

DOI: https://doi.org/10.1016/j.ajodo.2024.01.012

IF: 2.711

2024-03-16

American Journal of Orthodontics and Dentofacial Orthopedics

Abstract:Introduction This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer-3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions. Methods A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded. Results The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P <0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation. Conclusions Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model.

dentistry, oral surgery & medicine
Content analysis of AI-generated (ChatGPT) responses concerning orthodontic clear aligners

Sarah Abu Arqub,Dalya Al-Moghrabi,Veerasathpurush Allareddy,Madhur Upadhyay,Nikhilesh Vaid,Sumit Yadav

DOI: https://doi.org/10.2319/071123-484.1

2024-01-10

The Angle Orthodontist

Abstract:ABSTRACT Objectives To assess the accuracy of ChatGPT answers concerning orthodontic clear aligners. Materials and Methods A cross-sectional content analysis of ChatGPT generated responses to queries related to clear aligner treatment (CAT) was undertaken. A total of 111 questions were generated by three orthodontists based on a set of predefined domains and subdomains. The artificial intelligence (AI)-generated (ChatGPT) answers were extracted and their accuracy was determined independently by five orthodontists. The accuracy of answers was assessed using a prepiloted four-point scale scoring rubric. Descriptive statistics were performed. Results The total mean accuracy score for the entire set was 2.6 ± 1.1. It was noted that 58% of the AI-generated answers were scored as objectively true, 18% were selected facts, 9% were minimal facts, and 15% were false. False claims included the ability of CAT to reduce the need for orthognathic surgery (4.0 ± 0.0), improve airway function (3.8 ± 0.5), achieve root parallelism (3.6 ± 0.5), alleviate sleep apnea (3.8 ± 0.5), and produce more stable results compared to fixed appliances (3.8 ± 0.5). Conclusions The overall level of accuracy of ChatGPT responses to questions concerning CAT was suboptimal and lacked citations to relevant literature. Ability of the software to offer current and precise information was limited. Therefore, clinicians and patients must be mindful of false claims and relevant facts omitted in the answers generated by ChatGPT.

dentistry, oral surgery & medicine
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing

Miltiadis A Makrygiannakis,Kostis Giannakopoulos,Eleftherios G Kaklamanos

DOI: https://doi.org/10.1093/ejo/cjae017

IF: 3.131

2024-04-13

European Journal of Orthodontics

Abstract:Summary Background The increasing utilization of large language models (LLMs) in Generative Artificial Intelligence across various medical and dental fields, and specifically orthodontics, raises questions about their accuracy. Objective This study aimed to assess and compare the answers offered by four LLMs: Google’s Bard, OpenAI’s ChatGPT-3.5, and ChatGPT-4, and Microsoft’s Bing, in response to clinically relevant questions within the field of orthodontics. Materials and methods Ten open-type clinical orthodontics-related questions were posed to the LLMs. The responses provided by the LLMs were assessed on a scale ranging from 0 (minimum) to 10 (maximum) points, benchmarked against robust scientific evidence, including consensus statements and systematic reviews, using a predefined rubric. After a 4-week interval from the initial evaluation, the answers were reevaluated to gauge intra-evaluator reliability. Statistical comparisons were conducted on the scores using Friedman’s and Wilcoxon’s tests to identify the model providing the answers with the most comprehensiveness, scientific accuracy, clarity, and relevance. Results Overall, no statistically significant differences between the scores given by the two evaluators, on both scoring occasions, were detected, so an average score for every LLM was computed. The LLM answers scoring the highest, were those of Microsoft Bing Chat (average score = 7.1), followed by ChatGPT 4 (average score = 4.7), Google Bard (average score = 4.6), and finally ChatGPT 3.5 (average score 3.8). While Microsoft Bing Chat statistically outperformed ChatGPT-3.5 (P-value = 0.017) and Google Bard (P-value = 0.029), as well, and Chat GPT-4 outperformed Chat GPT-3.5 (P-value = 0.011), all models occasionally produced answers with a lack of comprehensiveness, scientific accuracy, clarity, and relevance. Limitations The questions asked were indicative and did not cover the entire field of orthodontics. Conclusions Language models (LLMs) show great potential in supporting evidence-based orthodontics. However, their current limitations pose a potential risk of making incorrect healthcare decisions if utilized without careful consideration. Consequently, these tools cannot serve as a substitute for the orthodontist’s essential critical thinking and comprehensive subject knowledge. For effective integration into practice, further research, clinical validation, and enhancements to the models are essential. Clinicians must be mindful of the limitations of LLMs, as their imprudent utilization could have adverse effects on patient care.

dentistry, oral surgery & medicine
How reliable is the artificial intelligence product large language model ChatGPT in orthodontics?

Kevser Kurt Demirsoy,Suleyman Kutalmış Buyuk,Tayyip Bicer

DOI: https://doi.org/10.2319/031224-207.1

2024-08-14

Abstract:Objectives: To evaluate the reliability of information produced by the artificial intelligence-based program ChatGPT in terms of accuracy and relevance, as assessed by orthodontists, dental students, and individuals seeking orthodontic treatment. Materials and methods: Frequently asked and curious questions in four basic areas related to orthodontics were prepared and asked in ChatGPT (Version 4.0), and answers were evaluated by three different groups (senior dental students, individuals seeking orthodontic treatment, orthodontists). Questions asked in these basic areas of orthodontics were about: clear aligners (CA), lingual orthodontics (LO), esthetic braces (EB), and temporomandibular disorders (TMD). The answers were evaluated by the Global Quality Scale (GQS) and Quality Criteria for Consumer Health Information (DISCERN) scale. Results: The total mean DISCERN score for answers on CA for students was 51.7 ± 9.38, for patients was 57.2 ± 10.73 and, for orthodontists was 47.4 ± 4.78 (P = .001). Comparison of GQS scores for LO among groups: students (3.53 ± 0.78), patients (4.40 ± 0.72), and orthodontists (3.63 ± 0.72) (P < .001). Intergroup comparison of ChatGPT evaluations about TMD was examined in terms of the DISCERN scale, with the highest value given in the patients group (57.83 ± 11.47) and lowest value in the orthodontist group (45.90 ± 11.84). When information quality evaluation about EB was examined, it GQS scores were >3 in all three groups (students: 3.50 ± 0.78; patients: 4.17 ± 0.87; orthodontists: 3.50 ± 0.82). Conclusions: ChatGPT has significant potential in terms of usability for patient information and education in the field of orthodontics if it is developed and necessary updates are made.
The Ilizarov Method for Correction of Complex Deformities. Psychological and Functional Outcomes*

H. Ghoneem,J. Wright,W. Cole,M. Rang

DOI: https://doi.org/10.1097/00004694-199703000-00031

1996-10-01

Abstract:We reviewed the psychological profile and functional ability of forty-five children (fifty-two extremities) who had had correction of deformities of the lower extremities with the Ilizarov method. Psychological changes were evaluated with the Post-Hospitalization Behavior Questionnaire and the Children's Depression Inventory, and the functional status was measured with the Children Health Information Service Rand Scale. The over-all satisfaction of the patient with regard to the outcome of the operation was assessed as well. The operations were performed between 1988 and 1992. The average age at the time of the operation was twelve years (range, three to eighteen years), and the average duration of follow-up was thirty-six months (range, twenty-four to seventy-two months). The lengthening index, duration of lengthening, and average number of complications were similar to those reported in other studies. All of the children had a normal psychological score, forty-two had no limitations in daily activities, and thirty-seven were satisfied with the over-all result.
Performance of three artificial intelligence (AI)‐based large language models in standardized testing; implications for AI‐assisted dental education

Hamoun Sabri,Muhammad H. A. Saleh,Parham Hazrati,Keith Merchant,Jonathan Misch,Purnima S. Kumar,Hom‐Lay Wang,Shayan Barootchi

DOI: https://doi.org/10.1111/jre.13323

2024-07-20

Journal of Periodontal Research

Abstract:ChatGPT‐4 outperforms ChatGPT‐3.5, Google Gemini, and human periodontics residents in standardized testing (AAP in‐service exams, 2020‐2023). This highlights the potential future role of AI in enhancing dental education. Introduction The emerging rise in novel computer technologies and automated data analytics has the potential to change the course of dental education. In line with our long‐term goal of harnessing the power of AI to augment didactic teaching, the objective of this study was to quantify and compare the accuracy of responses provided by ChatGPT (GPT‐4 and GPT‐3.5) and Google Gemini, the three primary large language models (LLMs), to human graduate students (control group) to the annual in‐service examination questions posed by the American Academy of Periodontology (AAP). Methods Under a comparative cross‐sectional study design, a corpus of 1312 questions from the annual in‐service examination of AAP administered between 2020 and 2023 were presented to the LLMs. Their responses were analyzed using chi‐square tests, and the performance was juxtaposed to the scores of periodontal residents from corresponding years, as the human control group. Additionally, two sub‐analyses were performed: one on the performance of the LLMs on each section of the exam; and in answering the most difficult questions. Results ChatGPT‐4 (total average: 79.57%) outperformed all human control groups as well as GPT‐3.5 and Google Gemini in all exam years (p

dentistry, oral surgery & medicine
Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content

Xiaolei Lv,Xiaomeng Zhang,Yuan Li,Xinxin Ding,Hongchang Lai,Junyu Shi

DOI: https://doi.org/10.2196/55847

2024-04-25

Abstract:Background: While large language models (LLMs) such as ChatGPT and Google Bard have shown significant promise in various fields, their broader impact on enhancing patient health care access and quality, particularly in specialized domains such as oral health, requires comprehensive evaluation. Objective: This study aims to assess the effectiveness of Google Bard, ChatGPT-3.5, and ChatGPT-4 in offering recommendations for common oral health issues, benchmarked against responses from human dental experts. Methods: This comparative analysis used 40 questions derived from patient surveys on prevalent oral diseases, which were executed in a simulated clinical environment. Responses, obtained from both human experts and LLMs, were subject to a blinded evaluation process by experienced dentists and lay users, focusing on readability, appropriateness, harmlessness, comprehensiveness, intent capture, and helpfulness. Additionally, the stability of artificial intelligence responses was also assessed by submitting each question 3 times under consistent conditions. Results: Google Bard excelled in readability but lagged in appropriateness when compared to human experts (mean 8.51, SD 0.37 vs mean 9.60, SD 0.33; P=.03). ChatGPT-3.5 and ChatGPT-4, however, performed comparably with human experts in terms of appropriateness (mean 8.96, SD 0.35 and mean 9.34, SD 0.47, respectively), with ChatGPT-4 demonstrating the highest stability and reliability. Furthermore, all 3 LLMs received superior harmlessness scores comparable to human experts, with lay users finding minimal differences in helpfulness and intent capture between the artificial intelligence models and human responses. Conclusions: LLMs, particularly ChatGPT-4, show potential in oral health care, providing patient-centric information for enhancing patient education and clinical care. The observed performance variations underscore the need for ongoing refinement and ethical considerations in health care settings. Future research focuses on developing strategies for the safe integration of LLMs in health care settings.
["Alternative" therapy methods in functional disorders of the gastrointestinal system].

M. Bittinger,J. Barnert,M. Wienbeck

1998-06-01

Abstract:Patients with functional disorders of the gastrointestinal tract often respond poorly to standard therapeutic regimes. Therefore, "alternative" forms of treatment (e.g. homocopathy, acupuncture, phytotherapy, diet modifications, psychotherapy, hypnosis) often come into play. Critical assessment of these forms of therapy is difficult: placebo response is high in functional disorders of the gastrointestinal tract and usually no placebo-controlled studies are available to prove the efficacy of these forms of therapy. Up to now no data was able to prove the efficacy of homoeopathy and phytotherapy, and the efficacy of acupuncture has to be questioned. In contrast to this, hyponosis, psychotherapy and some forms of diet modification seem to be useful at least in some patients with functional disorders of the gastrointestinal tract.
Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment

Nikhil S Patil,Ryan S Huang,Christian B van der Pol,Natasha Larocque,Nikhil S. Patil,Ryan S. Huang,Christian B. van der Pol

DOI: https://doi.org/10.1177/08465371231193716

2023-08-16

Canadian Association of Radiologists Journal

Abstract:Canadian Association of Radiologists Journal, Ahead of Print. PurposeBard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions.MethodsText-based questions were collected from the 2017-2021 American College of Radiology's Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well.Results318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, P < .0001). ChatGPT's response length was significantly shorter than Bard's (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, P < .0001). ChatGPT's response time was significantly longer than Bard's (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, P < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, P = .03), general & physics (85.39% vs 68.54%, P < .001), nuclear medicine (80.00% vs 56.67%, P < .01), pediatric radiology (93.75% vs 68.75%, P = .03), and ultrasound (100.00% vs 63.64%, P < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard's performance.ConclusionChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question.

radiology, nuclear medicine & medical imaging
Artificial intelligence in dental education: ChatGPT's performance on the periodontic in‐service examination

Arman Danesh,Hirad Pazouki,Farzad Danesh,Arsalan Danesh,Saynur Vardar‐Sengul

DOI: https://doi.org/10.1002/jper.23-0514

2024-01-11

Journal of Periodontology

Abstract:Background ChatGPT's (Chat Generative Pre‐Trained Transformer) remarkable capacity to generate human‐like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of misinformation. ChatGPT's capabilities should be examined in every corner of healthcare education, including dentistry and its specialties, to understand the potential of misinformation associated with the chatbot's use as a learning tool. Our investigation aims to explore ChatGPT's foundation of knowledge in the field of periodontology by evaluating the chatbot's performance on questions obtained from an in‐service examination administered by the American Academy of Periodontology (AAP). Methods ChatGPT3.5 and ChatGPT4 were evaluated on 311 multiple‐choice questions obtained from the 2023 in‐service examination administered by the AAP. The dataset of in‐service examination questions was accessed through Nova Southeastern University's Department of Periodontology. Our study excluded questions containing an image as ChatGPT does not accept image inputs. Results ChatGPT3.5 and ChatGPT4 answered 57.9% and 73.6% of in‐service questions correctly on the 2023 Periodontics In‐Service Written Examination, respectively. A two‐tailed t test was incorporated to compare independent sample means, and sample proportions were compared using a two‐tailed χ2 test. A p value below the threshold of 0.05 was deemed statistically significant. Conclusion While ChatGPT4 showed a higher proficiency compared to ChatGPT3.5, both chatbot models leave considerable room for misinformation with their responses relating to periodontology. The findings of the study encourage residents to scrutinize the periodontic information generated by ChatGPT to account for the chatbot's current limitations.

dentistry, oral surgery & medicine
Can artificial intelligence models serve as patient information consultants in orthodontics?

Derya Dursun,Rumeysa Bilici Geçer

DOI: https://doi.org/10.1186/s12911-024-02619-8

IF: 3.298

2024-07-30

BMC Medical Informatics and Decision Making

Abstract:To evaluate the accuracy, reliability, quality, and readability of responses generated by ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot in relation to orthodontic clear aligners.

medical informatics
Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model

Kelly F Molena,Ana P Macedo,Anum Ijaz,Fabrício K Carvalho,Maria Julia D Gallo,Francisco Wanderley Garcia de Paula E Silva,Andiara de Rossi,Luis A Mezzomo,Leda Regina F Mugayar,Alexandra M Queiroz

DOI: https://doi.org/10.7759/cureus.65658

2024-07-29

Cureus

Abstract:Background: Artificial intelligence (AI) can be a tool in the diagnosis and acquisition of knowledge, particularly in dentistry, sparking debates on its application in clinical decision-making. Objective: This study aims to evaluate the accuracy, completeness, and reliability of the responses generated by Chatbot Generative Pre-Trained Transformer (ChatGPT) 3.5 in dentistry using expert-formulated questions. Materials and methods: Experts were invited to create three questions, answers, and respective references according to specialized fields of activity. The Likert scale was used to evaluate agreement levels between experts and ChatGPT responses. Statistical analysis compared descriptive and binary question groups in terms of accuracy and completeness. Questions with low accuracy underwent re-evaluation, and subsequent responses were compared for improvement. The Wilcoxon test was utilized (α = 0.05). Results: Ten experts across six dental specialties generated 30 binary and descriptive dental questions and references. The accuracy score had a median of 5.50 and a mean of 4.17. For completeness, the median was 2.00 and the mean was 2.07. No difference was observed between descriptive and binary responses for accuracy and completeness. However, re-evaluated responses showed a significant improvement with a significant difference in accuracy (median 5.50 vs. 6.00; mean 4.17 vs. 4.80; p=0.042) and completeness (median 2.0 vs. 2.0; mean 2.07 vs. 2.30; p=0.011). References were more incorrect than correct, with no differences between descriptive and binary questions. Conclusions: ChatGPT initially demonstrated good accuracy and completeness, which was further improved with machine learning (ML) over time. However, some inaccurate answers and references persisted. Human critical discernment continues to be essential to facing complex clinical cases and advancing theoretical knowledge and evidence-based practice.
Application of Adaptive Models for the Determination of the Thermal Behaviour of a Photovoltaic Panel

Valerio Lo Brano,G. Ciulla,M. Beccali

DOI: https://doi.org/10.1007/978-3-642-39643-4_26

2013-06-24

Abstract:
Cytogenetic alterations in swine kidney cells persistently infected with hog cholera virus and propagated with and without antiserum in the medium.

E. C. Pirtle,L. Woods

American Journal of Veterinary Research

Abstract:
LOCALIZATION OF RADIOACTIVE ELEMENTS : (A Review).

J. Gross,C. P. Leblond

Canadian Medical Association Journal

Abstract:
Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

Dante J. Merlino,Santiago R. Brufau,George Saieed,Kathryn M. Van Abel,Daniel L. Price,David J. Archibald,Gregory A. Ator,Matthew L. Carlson

DOI: https://doi.org/10.1002/lary.31781

IF: 2.97

2024-09-23

The Laryngoscope

Abstract:This study assessed the baseline knowledge of advanced large language models (GPT‐3.5 and GPT‐4 by OpenAI; PaLM2 and MedPaLM by Google; LLama3:70b by Meta) in topics within otolaryngology—head and neck surgery, using a dataset of 4566 multiple choice, board‐style questions. The highest performing model, GPT‐4, correctly answered 77% of the time, while the lowest‐performing model, PaLM2, was correct on 56.5% of its responses; the free, open source model LLama3:70b correctly answered 66.8% of questions. Performance improved across models when asked to provide the reasoning behind their responses, with GPT‐4 correctly changing its incorrect answers to correct 31% of the time. Objective The purpose of this study was to evaluate the performance of advanced large language models from OpenAI (GPT‐3.5 and GPT‐4), Google (PaLM2 and MedPaLM), and an open source model from Meta (Llama3:70b) in answering clinical test multiple choice questions in the field of otolaryngology—head and neck surgery. Methods A dataset of 4566 otolaryngology questions was used; each model was provided a standardized prompt followed by a question. One hundred questions that were answered incorrectly by all models were further interrogated to gain insight into the causes of incorrect answers. Results GPT4 was the most accurate, correctly answering 3520 of 4566 questions (77.1%). MedPaLM correctly answered 3223 of 4566 (70.6%) questions, while llama3:70b, GPT3.5, and PaLM2 were correct on 3052 of 4566 (66.8%), 2672 of 4566 (58.5%), and 2583 of 4566 (56.5%) questions. Three hundred and sixty‐nine questions were answered incorrectly by all models. Prompts to provide reasoning improved accuracy in all models: GPT4 changed from incorrect to correct answer 31% of the time, while GPT3.5, Llama3, PaLM2, and MedPaLM corrected their responses 25%, 18%, 19%, and 17% of the time, respectively. Conclusion Large language models vary in their understanding of otolaryngology‐specific clinical knowledge. OpenAI's GPT4 has a strong understanding of core concepts as well as detailed information in the field of otolaryngology. Its baseline understanding in this field makes it well‐suited to serve in roles related to head and neck surgery education provided that the appropriate precautions are taken and potential limitations are understood. Level of Evidence N/A Laryngoscope, 2024

medicine, research & experimental,otorhinolaryngology
Comparing the Efficacy of Large Language Models ChatGPT, Bard, and Bing AI in Providing Information on Rhinoplasty: An Observational Study

Ishith Seth,Bryan Lim,Yi Xie,Jevan Cevik,Warren M Rozen,Richard J Ross,Mathew Lee

DOI: https://doi.org/10.1093/asjof/ojad084

2023-09-14

Aesthetic Surgery Journal Open Forum

Abstract:Abstract Background Large language models (LLMs) are emerging artificial intelligence (AI) technology refining research and healthcare. The impact of these models on presurgical planning and education remains under-explored. Objectives This study aims to assess 3 prominent LLMs – Google’s AI BARD (Mountain View, CA), Bing’s AI (Microsoft; Redmond, WA), and ChatGPT-3.5 (Open AI; San Francisco, CA) in providing safe medical information for rhinoplasty. Methods Six questions regarding rhinoplasty were prompted to ChatGPT, BARD, and Bing AI. A Likert scale was used to evaluate these responses by a panel of Specialist Plastic and Reconstructive Surgeons with extensive experience in rhinoplasty. To measure reliability the Flesch Reading Ease Score, the Flesch-Kincaid Grade Level, and the Coleman-Liau Index were used. The modified DISCERN score was chosen as the criterion for assessing suitability and reliability. Student’s t-test was performed to calculate the difference between the LLMs and a double-sided P value < 0.05 was considered statistically significant. Results Reliability-wise, BARD and ChatGPT demonstrated significantly (P<0.05) greater Flesch Reading Ease Score of 47.47 (±15.32) and 37.68 (±12.96), Flesch-Kincaid Grade Level of 9.7 (±3.12) and 10.15 (±1.84), and Coleman-Liau Index of 10.83 (±2.14) and 12.17 (±1.17) than Bing AI. Suitability-wise, BARD (46.3 ±2.8) demonstrated a significantly greater DISCERN score than ChatGPT and Bing AI. Likert score-wise, ChatGPT and BARD demonstrated similar scores and were greater than Bing AI. Conclusions BARD delivered the most succinct and comprehensible information, followed by ChatGPT and BingAI. Although these models demonstrate potential, challenges remain regarding depth and specificity. Future research should aim to augment LLM performance through the integration of specialized databases and expert knowledge, while also refining their algorithms.
Accuracy of Treatment Recommendations by Pragmatic Evidence Search and Artificial Intelligence: An Exploratory Study

Zunaira Baig,Daniel Lawrence,Mahen Ganhewa,Nicola Cirillo

DOI: https://doi.org/10.3390/diagnostics14050527

IF: 3.6

2024-03-02

Diagnostics

Abstract:There is extensive literature emerging in the field of dentistry with the aim to optimize clinical practice. Evidence-based guidelines (EBGs) are designed to collate diagnostic criteria and clinical treatment for a range of conditions based on high-quality evidence. Recently, advancements in Artificial Intelligence (AI) have instigated further queries into its applicability and integration into dentistry. Hence, the aim of this study was to develop a model that can be used to assess the accuracy of treatment recommendations for dental conditions generated by individual clinicians and the outcomes of AI outputs. For this pilot study, a Delphi panel of six experts led by CoTreat AI provided the definition and developed evidence-based recommendations for subgingival and supragingival calculus. For the rapid review—a pragmatic approach that aims to rapidly assess the evidence base using a systematic methodology—the Ovid Medline database was searched for subgingival and supragingival calculus. Studies were selected and reported based on the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA), and this study complied with the minimum requirements for completing a restricted systematic review. Treatment recommendations were also searched for these same conditions in ChatGPT (version 3.5 and 4) and Bard (now Gemini). Adherence to the recommendations of the standard was assessed using qualitative content analysis and agreement scores for interrater reliability. Treatment recommendations by AI programs generally aligned with the current literature, with an agreement of up to 75%, although data sources were not provided by these tools, except for Bard. The clinician's rapid review results suggested several procedures that may increase the likelihood of overtreatment, as did GPT4. In terms of overall accuracy, GPT4 outperformed all other tools, including rapid review (Cohen's kappa 0.42 vs. 0.28). In summary, this study provides preliminary observations for the suitability of different evidence-generating methods to inform clinical dental practice.

medicine, general & internal
Accuracy and Completeness of Bard and Chat-GPT 4 Responses for Questions Derived from the International Consensus Statement on Endoscopic Skull-Base Surgery 2019

Yavar Abgin,Kayla Umemoto,Andrew Goulian,Missael Vasquez,Sean Polster,Arthur Wu,Christopher Roxbury,Pranay Soni,Omar G. Ahmed,Dennis M. Tang

DOI: https://doi.org/10.1055/a-2436-4222

2024-10-31

Journal of Neurological Surgery Part B Skull Base

Abstract:Artificial intelligence large language models (LLMs), such as Chat Generative Pre-Trained Transformer 4 (Chat-GPT) by OpenAI and Bard by Google, emerged in 2022 as tools for answering questions, providing information, and offering suggestions to the layperson. These LLMs impact how information is disseminated and it is essential to compare their answers to experts in the corresponding field. The International Consensus Statement on Endoscopic Skull-Base Surgery 2019 (ICAR:SB) is a multidisciplinary international collaboration that critically evaluated and graded the current literature. Objectives Evaluate the accuracy and completeness of Chat-GPT and Bard responses to questions derived from the ICAR:SB policy statements. Design Thirty-four questions were created based on ICAR:SB policy statements and input into Chat-GPT and Bard. Two rhinologists and two neurosurgeons graded the accuracy and completeness of LLM responses, using a 5-point Likert scale. The Wilcoxon rank-sum and Kruskal–Wallis tests were used for analysis. Setting Online. Participants None. Outcomes Compare the mean accuracy and completeness scores between (1) responses generated by Chat-GPT versus Bard and (2) rhinologists versus neurosurgeons. Results Using the Wilcoxon rank-sum test, there were statistically significant differences in (1) accuracy ( p < 0.001) and completeness ( p < 0.001) of Chat-GPT compared with Bard; and (2) accuracy ( p < 0.001) and completeness ( p < 0.001) ratings between rhinologists and neurosurgeons. Conclusion Chat-GPT responses are overall more accurate and complete compared with Bard, but both are very accurate and complete. Overall, rhinologists graded lower than neurosurgeons. Further research is needed to better understand the full potential of LLMs. Received: 07 March 2024 Accepted: 06 October 2024 Accepted Manuscript online: 08 October 2024 Article published online: 30 October 2024 © 2024. Thieme. All rights reserved. Georg Thieme Verlag KG Rüdigerstraße 14, 70469 Stuttgart, Germany

surgery,clinical neurology

Histochemical studies on idiopathic medionecrosis of the aorta.

An evaluation of orthodontic information quality regarding artificial intelligence (AI) chatbot technologies: A comparison of ChatGPT and google BARD

The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard

Content analysis of AI-generated (ChatGPT) responses concerning orthodontic clear aligners

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing

How reliable is the artificial intelligence product large language model ChatGPT in orthodontics?

The Ilizarov Method for Correction of Complex Deformities. Psychological and Functional Outcomes*

Performance of three artificial intelligence (AI)‐based large language models in standardized testing; implications for AI‐assisted dental education

Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content

["Alternative" therapy methods in functional disorders of the gastrointestinal system].

Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment

Artificial intelligence in dental education: ChatGPT's performance on the periodontic in‐service examination

Can artificial intelligence models serve as patient information consultants in orthodontics?

Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model

Application of Adaptive Models for the Determination of the Thermal Behaviour of a Photovoltaic Panel

Cytogenetic alterations in swine kidney cells persistently infected with hog cholera virus and propagated with and without antiserum in the medium.

LOCALIZATION OF RADIOACTIVE ELEMENTS : (A Review).

Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

Comparing the Efficacy of Large Language Models ChatGPT, Bard, and Bing AI in Providing Information on Rhinoplasty: An Observational Study

Accuracy of Treatment Recommendations by Pragmatic Evidence Search and Artificial Intelligence: An Exploratory Study

Accuracy and Completeness of Bard and Chat-GPT 4 Responses for Questions Derived from the International Consensus Statement on Endoscopic Skull-Base Surgery 2019