Enhancement of mucosal immune response against HIV-1 Gag by DNA immunization.

I. Yoshizawa,Y. Soda,T. Mizuochi,S. Yasuda,T. Rizvi,T. Takemori,Y. Tsunetsugu-Yokota

DOI: https://doi.org/10.1016/S0264-410X(00)00539-9

IF: 4.169

2001-04-06

Vaccine

Abstract:

What problem does this paper attempt to address?

A context-based chatbot surpasses trained radiologists and generic ChatGPT in following the ACR appropriateness guidelines

Alexander Rau,Stephan Rau,Anna Fink,Hien Tran,Caroline Wilpert,Johanna Nattenmueller,Jakob Neubauer,Fabian Bamberg,Marco Reisert,Maximilian F Russe

DOI: https://doi.org/10.1101/2023.04.10.23288354

2023-04-20

MedRxiv

Abstract:Background Radiological imaging guidelines are crucial for accurate diagnosis and optimal patient care as they result in standardized procedures and thus reduce inappropriate imaging studies. In the present study, we investigated the potential to support clinical decision-making using an interactive chatbot designed to provide personalized imaging recommendations based on indexed and vectorized American College of Radiology (ACR) appropriateness criteria documents. Methods We utilized 209 ACR appropriateness criteria documents as specialized knowledge base and employed LlamaIndex and the ChatGPT 3.5-Turbo to create an appropriateness criteria contexted chatbot (accGPT). Fifty clinical case files were used to compare the accGPT`s performance against radiologists at varying experience levels and to generic ChatGPT 3.5 and 4.0. Results All chatbots reached at least human performance level. For the 50 case files, the accGPT provided a median of 83% (95% CI 82-84) `usually appropriate` recommendations, while radiologists provided a median of 66% (95% CI 62-70). GPT 3.5-Turbo 70% (95% CI 67-73) and GPT 4 79% (95% CI 76-81) correct answers. Consistency was highest for the accGPT with almost perfect Fleiss` Kappa of 0.82. Further, the chatbots provided substantial time and cost savings, with an average decision time of 5 minutes and a cost of 0.19 Euro for all cases, compared to 50 minutes and 29.99 Euro for radiologists (both p < 0.01). Conclusion ChatGPT-based algorithms have the potential to substantially improve the decision-making for clinical imaging studies in accordance with ACR guidelines. Specifically, a context-based algorithm performed superior to its generic counterpart, demonstrating the value of tailoring AI solutions to specific healthcare applications.

English Else
Kaposi's sarcoma-associated herpesvirus-encoded LANA associates with glucocorticoid receptor and enhances its transcriptional activities.

S. Togi,Misa Nakasuji,R. Muromoto,O. Ikeda,Kan Okabe,Yuichi Kitai,S. Kon,K. Oritani,T. Matsuda

DOI: https://doi.org/10.1016/j.bbrc.2015.05.080

2015-07-31

Abstract:
ChatGPT v4 outperforming v3.5 on cancer treatment recommendations in quality, clinical guideline, and expert opinion concordance

Chung-You Tsai,Pai-Yu Cheng,Juinn-Horng Deng,Fu-Shan Jaw,Shyi-Chun Yii

DOI: https://doi.org/10.1177/20552076241269538

2024-08-14

Abstract:Objectives: To assess the quality and alignment of ChatGPT's cancer treatment recommendations (RECs) with National Comprehensive Cancer Network (NCCN) guidelines and expert opinions. Methods: Three urologists performed quantitative and qualitative assessments in October 2023 analyzing responses from ChatGPT-4 and ChatGPT-3.5 to 108 prostate, kidney, and bladder cancer prompts using two zero-shot prompt templates. Performance evaluation involved calculating five ratios: expert-approved/expert-disagreed and NCCN-aligned RECs against total ChatGPT RECs plus coverage and adherence rates to NCCN. Experts rated the response's quality on a 1-5 scale considering correctness, comprehensiveness, specificity, and appropriateness. Results: ChatGPT-4 outperformed ChatGPT-3.5 in prostate cancer inquiries, with an average word count of 317.3 versus 124.4 (p < 0.001) and 6.1 versus 3.9 RECs (p < 0.001). Its rater-approved REC ratio (96.1% vs. 89.4%) and alignment with NCCN guidelines (76.8% vs. 49.1%, p = 0.001) were superior and scored significantly better on all quality dimensions. Across 108 prompts covering three cancers, ChatGPT-4 produced an average of 6.0 RECs per case, with an 88.5% approval rate from raters, 86.7% NCCN concordance, and only a 9.5% disagreement rate. It achieved high marks in correctness (4.5), comprehensiveness (4.4), specificity (4.0), and appropriateness (4.4). Subgroup analyses across cancer types, disease statuses, and different prompt templates were reported. Conclusions: ChatGPT-4 demonstrated significant improvement in providing accurate and detailed treatment recommendations for urological cancers in line with clinical guidelines and expert opinion. However, it is vital to recognize that AI tools are not without flaws and should be utilized with caution. ChatGPT could supplement, but not replace, personalized advice from healthcare professionals.
Ultrastructure of cryopreserved horse embryos.

J. Wilson,T. Caceci,G. Potter,Duane C. Kraemer

Abstract:Embryos were recovered non-surgically at about Day 6 after ovulation from 15 Quarter horse-type mares and were evaluated for morphological changes which may occur because of exposure to the cryoprotectant and/or cryopreservation. Electron microscopy was used to elucidate the fine structure of intracellular organelles which, if damaged, could cause cellular death. The horse embryo does not totally re-expand in the 10% glycerol freezing medium, nor will it completely re-expand in the isotonic holding medium following glycerol removal whether or not the embryo has been frozen. Embryos in this study were frozen by the same protocol which had resulted in a 30% pregnancy rate for similarly frozen embryos. Junctional complexes between trophoblast cells, as well as the plasma and nuclear membranes of trophoblast and inner cell mass (ICM) cells, were intact after treatment in all embryos. Changes in lipid droplets and some mitochondrial degeneration were observed in the ICM cells of the glycerol-treated embryos. The change in the lipid was not observed in the frozen-thawed embryos, but mitochondrial changes were evident in the trophoblast and ICM cells, with the most extensive mitochondrial damage in the ICM cells.
A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging

Maximilian Frederik Russe,Alexander Rau,Michael Andreas Ermer,René Rothweiler,Sina Wenger,Klara Klöble,Ralf K W Schulze,Fabian Bamberg,Rainer Schmelzeisen,Marco Reisert,Wiebke Semper-Hogg

DOI: https://doi.org/10.1093/dmfr/twad015

2024-01-05

Dentomaxillofacial Radiology

Abstract:Abstract Objectives To develop a content-aware chatbot based on GPT-3.5-Turbo and GPT-4 with specialized knowledge on the German S2 Cone-Beam CT (CBCT) dental imaging guideline and to compare the performance against humans. Methods The LlamaIndex software library was used to integrate the guideline context into the chatbots. Based on the CBCT S2 guideline, 40 questions were posed to content-aware chatbots and early career and senior practitioners with different levels of experience served as reference. The chatbots’ performance was compared in terms of recommendation accuracy and explanation quality. Chi-square test and one-tailed Wilcoxon signed rank test evaluated accuracy and explanation quality, respectively. Results The GPT-4 based chatbot provided 100% correct recommendations and superior explanation quality compared to the one based on GPT3.5-Turbo (87.5% vs. 57.5% for GPT-3.5-Turbo; P = .003). Moreover, it outperformed early career practitioners in correct answers (P = .002 and P = .032) and earned higher trust than the chatbot using GPT-3.5-Turbo (P = 0.006). Conclusions A content-aware chatbot using GPT-4 reliably provided recommendations according to current consensus guidelines. The responses were deemed trustworthy and transparent, and therefore facilitate the integration of artificial intelligence into clinical decision-making.

dentistry, oral surgery & medicine,radiology, nuclear medicine & medical imaging
Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4

Adi Lahat,Kassem Sharif,Narmin Zoabi,Yonatan Shneor Patt,Yousra Sharif,Lior Fisher,Uria Shani,Mohamad Arow,Roni Levin,Eyal Klang

DOI: https://doi.org/10.2196/54571

2024-06-27

Abstract:Background: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. Objective: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types. Methods: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. Results: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions. Conclusions: ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection

Yiftach Barash,Eyal Klang,Eli Konen,Vera Sorin

DOI: https://doi.org/10.1016/j.jacr.2023.06.009

IF: 6.24

2023-07-09

Journal of the American College of Radiology

Abstract:Objectives The quality of radiology referrals influences patient management and imaging interpretation by radiologists. This study aims to evaluate ChatGPT-4 as a decision support tool for selecting imaging examinations and generating radiology referrals in the emergency department (ED). Material and Methods We retrospectively extracted five consecutive clinical notes from the ED, for each of the following pathologies: pulmonary embolism, obstructing kidney stones, acute appendicitis, diverticulitis, small bowel obstruction, acute cholecystitis, acute hip fracture, and testicular torsion. A total of 40 cases were included. We entered these notes into ChatGPT-4, requesting recommendation on the most appropriate imaging examination and protocol. We also asked the chatbot to generate radiology referrals. Two independent radiologists graded the referral on a 1-5 scale based on clarity, clinical relevance, and differential diagnosis. We compared the chatbot's imaging recommendations to the American College of Radiology Appropriateness Criteria (ACR AC), and to the examinations performed in the ED. Agreement between readers was assessed using linear weighted Cohen's kappa coefficient. Results ChatGPT-4's imaging recommendations aligned with ACR AC and ED examinations in all cases. Protocol discrepancies between ChatGPT and the ACR AC were observed in two cases (5%).. ChatGPT-4-generated referrals received mean scores of 4.6 and 4.8 for clarity, 4.5 and 4.4 for clinical relevance, and 4.9 from both reviewers for differential diagnosis. Agreement between readers was moderate for clinical relevance and clarity, and substantial for differential diagnosis grading. Conclusion ChatGPT-4 has shown potential in aiding imaging study selection for select clinical cases. As a complementary tool, large language models may improve radiology referral quality. Radiologists should stay informed about this technology and be mindful of potential challenges and risks.

radiology, nuclear medicine & medical imaging
[Indications and results of splenectomy in intermediate thalassemia].

A. M. Maniga,M. Longinotti,G. Dettori,P. Bacciu,G. Noya,G. Marongiu,G. Frassetto,P. Biglioli

1983-05-31

Minerva Chirurgica

Abstract:
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

Yixing Huang,Ahmed Gomaa,Sabine Semrau,Marlen Haderlein,Sebastian Lettmaier,Thomas Weissmann,Johanna Grigo,Hassen Ben Tkhayat,Benjamin Frey,Udo S. Gaipl,Luitpold V. Distel,Andreas Maier,Rainer Fietkau,Christoph Bert,Florian Putz

2023-08-21

Abstract:The potential of large language models in medicine for education and decision making purposes has been demonstrated as they achieve decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. In this work, we evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology using the 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases. For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 63.65% and 74.57%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts. Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Because of the risk of hallucination, facts provided by ChatGPT always need to be verified.

Medical Physics,Computation and Language
Artificial Intelligence Chatbots' Understanding of the Risks and Benefits of Computed Tomography and Magnetic Resonance Imaging Scenarios

Nikhil S. Patil,Ryan S. Huang,Scott Caterine,Jason Yao,Natasha Larocque,Christian B. van der Pol,Euan Stubbs

DOI: https://doi.org/10.1177/08465371231220561

2024-01-07

Canadian Association of Radiologists Journal

Abstract:Canadian Association of Radiologists Journal, Ahead of Print. Purpose:Patients may seek online information to better understand medical imaging procedures. The purpose of this study was to assess the accuracy of information provided by 2 popular artificial intelligence (AI) chatbots pertaining to common imaging scenarios' risks, benefits, and alternatives.Methods:Fourteen imaging-related scenarios pertaining to computed tomography (CT) or magnetic resonance imaging (MRI) were used. Factors including the use of intravenous contrast, the presence of renal disease, and whether the patient was pregnant were included in the analysis. For each scenario, 3 prompts for outlining the (1) risks, (2) benefits, and (3) alternative imaging choices or potential implications of not using contrast were inputted into ChatGPT and Bard. A grading rubric and a 5-point Likert scale was used by 2 independent reviewers to grade responses. Prompt variability and chatbot context dependency were also assessed.Results:ChatGPT's performance was superior to Bard's in accurately responding to prompts per Likert grading (4.36 ± 0.63 vs 3.25 ± 1.03 seconds, P < .0001). There was substantial agreement between independent reviewer grading for ChatGPT (κ = 0.621) and Bard (κ = 0.684). Response text length was not statistically different between ChatGPT and Bard (2087 ± 256 characters vs 2162 ± 369 characters, P = .24). Response time was longer for ChatGPT (34 ± 2 vs 8 ± 1 seconds, P < .0001).Conclusions:ChatGPT performed superior to Bard at outlining risks, benefits, and alternatives to common imaging scenarios. Generally, context dependency and prompt variability did not change chatbot response content. Due to the lack of detailed scientific reasoning and inability to provide patient-specific information, both AI chatbots have limitations as a patient information resource.

radiology, nuclear medicine & medical imaging
Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

Cheng Jiao,Neel R. Edupuganti,Parth A. Patel,Tommy Bui,Veeral Sheth,Neel Edupuganti,Neel R Edupuganti,Parth A Patel

DOI: https://doi.org/10.7759/cureus.45700

2023-09-21

Cureus

Abstract:OBJECTIVE: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges.METHODS AND ANALYSIS: Both models' accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) "Diagnosis This" questions. Additional analysis was based on image content, question difficulty, character length of models' responses, and model's alignment with responses from human respondents. χ2 test, Fisher's exact test, Student's t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant.RESULTS: GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0's answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0's answer (ρ=0.713, p<0.01).CONCLUSION AND RELEVANCE: Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education.
Satisfaction with child care arrangements: effects on adaptation to parenthood.

Laurie Leventhal-Belfer,P. Cowan,C. Cowan

DOI: https://doi.org/10.1037/h0079331

1992-04-01

American Journal of Orthopsychiatry

Abstract:The relationships between parents' satisfaction with their child care arrangements and factors from four domains of a family systems model were examined for effects on adaptation to parenthood. Results suggest the desirability of including both mothers and fathers in child care interventions, public policy debates, and future research.
ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives

Pedram Keshavarz,Sara Bagherieh,Seyed Ali Nabipoorashrafi,Hamid Chalian,Amir Ali Rahsepar,Grace Hyun J. Kim,Cameron Hassani,Steven S. Raman,Arash Bedayat

DOI: https://doi.org/10.1016/j.diii.2024.04.003

IF: 7.242

2024-04-30

Diagnostic and Interventional Imaging

Abstract:Purpose The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications. Materials and methods After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications. Results Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks. Conclusion Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

radiology, nuclear medicine & medical imaging
Evaluating the performance and clinical decision‐making impact of ChatGPT‐4 in reproductive medicine

Rouzhu Chen,Danling Zeng,Yi Li,Rui Huang,Dejuan Sun,Tingting Li

DOI: https://doi.org/10.1002/ijgo.15959

2024-11-12

International Journal of Gynecology & Obstetrics

Abstract:Background ChatGPT, a sophisticated language model developed by OpenAI, has the potential to offer professional and patient‐friendly support. We aimed to assess the accuracy and reproducibility of ChatGPT‐4 in answering questions related to knowledge, management, and support within the field of reproductive medicine. Methods ChatGPT‐4 was used to respond to queries sourced from a domestic attending physician examination database, as well as to address both local and international treatment guidelines within the field of reproductive medicine. Each response generated by ChatGPT‐4 was independently evaluated by a trio of experts specializing in reproductive medicine. The experts used four qualitative measures—relevance, accuracy, completeness, and understandability—to assess each response. Results We found that ChatGPT‐4 demonstrated extensive knowledge in reproductive medicine, with median scores for relevance, accuracy, completeness, and comprehensibility of objective questions being 4, 3.5, 3, and 3, respectively. However, the composite accuracy rate for multiple‐choice questions was 63.38%. Significant discrepancies were observed among the three experts' scores across all four measures. Expert 1 generally provided higher and more consistent scores, while Expert 3 awarded lower scores for accuracy. ChatGPT‐4's responses to both domestic and international guidelines showed varying levels of understanding, with a lack of knowledge on regional guideline variations. However, it offered practical and multifaceted advice regarding next steps and adjusting to new guidelines. Conclusions We analyzed the strengths and limitations of ChatGPT‐4's responses on the management of reproductive medicine and relevant support. ChatGPT‐4 might serve as a supplementary informational tool for patients and physicians to improve outcomes in the field of reproductive medicine.

obstetrics & gynecology
What is the most effective management of neurogenic bladder dysfunction?

B. Buckley,A. Grant

DOI: https://doi.org/10.1136/bmj.b659

2009-03-12

British Medical Journal

Abstract:Neural damage from injury to the spinal cord or neurological conditions, such as multiple sclerosis or spina bifida, may cause neurogenic bladder dysfunction; this affects the ability to retain or to void urine, or both. Considerable uncertainty exists regarding the most effective management of neurogenic bladder dysfunction, which aims to prevent urinary tract infection and preserve upper urinary tract health, continence, and quality of life.1 2 The effect of incontinence on quality of life can hardly be overestimated.3 Management is usually long term, complex, and determined by the underlying neurological abnormality. It is often guided by urodynamic evaluation of lower urinary tract function. The patient’s preferences and availability of help from carers also affect management choices and must be carefully assessed. Management options depend on the patient’s characteristics but include indwelling urethral or suprapubic catheterisation; intermittent catheterisation; urine collection by an external device; sacral nerve stimulation; and surgical techniques such as urinary diversion, bladder augmentation or substitution, sphincterotomy, and artificial sphincter implantation. But which technique is the most effective? For instance, which catheter policy results in fewest urinary tract infections? Intermittent catheterisation has several advantages over other techniques—fewer infections, reduced equipment needs, and greater independence.4 However, the patient or carer must be able and willing to perform the procedure so it may not always be practicable. Indwelling catheters have become a …
Current applications and future potential of ChatGPT in radiology: A systematic review

Hugo C Temperley,Niall J O'Sullivan,Benjamin M Mac Curtain,Alison Corr,James F Meaney,Michael E Kelly,Ian Brennan

DOI: https://doi.org/10.1111/1754-9485.13621

2024-01-21

Journal of Medical Imaging and Radiation Oncology

Abstract:Summary This study aimed to comprehensively evaluate the current utilization and future potential of ChatGPT, an AI‐based chat model, in the field of radiology. The primary focus is on its role in enhancing decision‐making processes, optimizing workflow efficiency, and fostering interdisciplinary collaboration and teaching within healthcare. A systematic search was conducted in PubMed, EMBASE and Web of Science databases. Key aspects, such as its impact on complex decision‐making, workflow enhancement and collaboration, were assessed. Limitations and challenges associated with ChatGPT implementation were also examined. Overall, six studies met the inclusion criteria and were included in our analysis. All studies were prospective in nature. A total of 551 chatGPT (version 3.0 to 4.0) assessment events were included in our analysis. Considering the generation of academic papers, ChatGPT was found to output data inaccuracies 80% of the time. When ChatGPT was asked questions regarding common interventional radiology procedures, it contained entirely incorrect information 45% of the time. ChatGPT was seen to better answer US board‐style questions when lower order thinking was required (P = 0.002). Improvements were seen between chatGPT 3.5 and 4.0 in regard to imaging questions with accuracy rates of 61 versus 85%(P = 0.009). ChatGPT was observed to have an average translational ability score of 4.27/5 on the Likert scale regarding CT and MRI findings. ChatGPT demonstrates substantial potential to augment decision‐making and optimizing workflow. While ChatGPT's promise is evident, thorough evaluation and validation are imperative before widespread adoption in the field of radiology.

radiology, nuclear medicine & medical imaging
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment

Damien Gibson,Stuart Jackson,Ramesh Shanmugasundaram,Ishith Seth,Adrian Siu,Nariman Ahmadi,Jonathan Kam,Nicholas Mehan,Ruban Thanigasalam,Nicola Jeffery,Manish I Patel,Scott Leslie

DOI: https://doi.org/10.2196/55939

2024-08-14

Abstract:Background: Artificial intelligence (AI) chatbots, such as ChatGPT, have made significant progress. These chatbots, particularly popular among health care professionals and patients, are transforming patient education and disease experience with personalized information. Accurate, timely patient education is crucial for informed decision-making, especially regarding prostate-specific antigen screening and treatment options. However, the accuracy and reliability of AI chatbots' medical information must be rigorously evaluated. Studies testing ChatGPT's knowledge of prostate cancer are emerging, but there is a need for ongoing evaluation to ensure the quality and safety of information provided to patients. Objective: This study aims to evaluate the quality, accuracy, and readability of ChatGPT-4's responses to common prostate cancer questions posed by patients. Methods: Overall, 8 questions were formulated with an inductive approach based on information topics in peer-reviewed literature and Google Trends data. Adapted versions of the Patient Education Materials Assessment Tool for AI (PEMAT-AI), Global Quality Score, and DISCERN-AI tools were used by 4 independent reviewers to assess the quality of the AI responses. The 8 AI outputs were judged by 7 expert urologists, using an assessment framework developed to assess accuracy, safety, appropriateness, actionability, and effectiveness. The AI responses' readability was assessed using established algorithms (Flesch Reading Ease score, Gunning Fog Index, Flesch-Kincaid Grade Level, The Coleman-Liau Index, and Simple Measure of Gobbledygook [SMOG] Index). A brief tool (Reference Assessment AI [REF-AI]) was developed to analyze the references provided by AI outputs, assessing for reference hallucination, relevance, and quality of references. Results: The PEMAT-AI understandability score was very good (mean 79.44%, SD 10.44%), the DISCERN-AI rating was scored as "good" quality (mean 13.88, SD 0.93), and the Global Quality Score was high (mean 4.46/5, SD 0.50). Natural Language Assessment Tool for AI had pooled mean accuracy of 3.96 (SD 0.91), safety of 4.32 (SD 0.86), appropriateness of 4.45 (SD 0.81), actionability of 4.05 (SD 1.15), and effectiveness of 4.09 (SD 0.98). The readability algorithm consensus was "difficult to read" (Flesch Reading Ease score mean 45.97, SD 8.69; Gunning Fog Index mean 14.55, SD 4.79), averaging an 11th-grade reading level, equivalent to 15- to 17-year-olds (Flesch-Kincaid Grade Level mean 12.12, SD 4.34; The Coleman-Liau Index mean 12.75, SD 1.98; SMOG Index mean 11.06, SD 3.20). REF-AI identified 2 reference hallucinations, while the majority (28/30, 93%) of references appropriately supplemented the text. Most references (26/30, 86%) were from reputable government organizations, while a handful were direct citations from scientific literature. Conclusions: Our analysis found that ChatGPT-4 provides generally good responses to common prostate cancer queries, making it a potentially valuable tool for patient education in prostate cancer care. Objective quality assessment tools indicated that the natural language processing outputs were generally reliable and appropriate, but there is room for improvement.
Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology

Yixing Huang,Ahmed Gomaa,Sabine Semrau,Marlen Haderlein,Sebastian Lettmaier,Thomas Weissmann,Johanna Grigo,Hassen Ben Tkhayat,Benjamin Frey,Udo Gaipl,Luitpold Distel,Andreas Maier,Rainer Fietkau,Christoph Bert,Florian Putz

DOI: https://doi.org/10.3389/fonc.2023.1265024

IF: 4.7

2023-09-15

Frontiers in Oncology

Abstract:Purpose: The potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology. Methods: The 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases. Results: For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts. Conclusion: Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.

oncology
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI

Hongyan Wang,WeiZhen Wu,Zhi Dou,Liangliang He,Liqiang Yang

DOI: https://doi.org/10.1016/j.ijmedinf.2023.105173

IF: 4.73

2023-08-07

International Journal of Medical Informatics

Abstract:Background Although chat generative pre-trained transformer ( ChatGPT) has made several successful attempts in the medical field, most notably in answering medical questions in English, no studies have evaluated ChatGPT's performance in a Chinese context for a medical task. Objective The aim of this study was to evaluate ChatGPT's ability to understand medical knowledge in Chinese, as well as its potential to serve as an electronic health infrastructure for medical development, by evaluating its performance in medical examinations, records, and education. Method The Chinese (CNMLE) and English (ENMLE) datasets of the China National Medical Licensing Examination and the Chinese dataset (NEEPM) of the China National Entrance Examination for Postgraduate Clinical Medicine Comprehensive Ability were used to evaluate the performance of ChatGPT (GPT-3.5 and GPT-4). We assessed answer accuracy, verbal fluency, and the classification of incorrect responses owing to hallucinations on multiple occasions. In addition, we tested ChatGPT's performance on discharge summaries and group learning in a Chinese context on a small scale. Results The accuracy of GPT-3.5 in CNMLE, ENMLE, and NEEPM was 56% (56/100), 76% (76/100), and 62% (62/100), respectively, compared to that of GPT-4, which was of 84% (84/100), 86% (86/100), and 82% (82/100). The verbal fluency of all the ChatGPT responses exceeded 95%. Among the GPT-3.5 incorrect responses, the proportions of open-domain hallucinations were 66 % (29/44), 54 % (14/24), and 63 % (24/38), whereas close-domain hallucinations accounted for 34 % (15/44), 46 % (14/24), and 37 % (14/38), respectively. By contrast, GPT-4 open-domain hallucinations accounted for 56% (9/16), 43% (6/14), and 83% (15/18), while close-domain hallucinations accounted for 44% (7/16), 57% (8/14), and 17% (3/18), respectively. In the discharge summary, ChatGPT demonstrated logical coherence, however GPT-3.5 could not fulfill the quality requirements, while GPT-4 met the qualification of 60% (6/10). In group learning, the verbal fluency and interaction satisfaction with ChatGPT were 100% (10/10). Conclusion ChatGPT based on GPT-4 is at par with Chinese medical practitioners who passed the CNMLE and at the standard required for admission to clinical medical graduate programs in China. The GPT-4 shows promising potential for discharge summarization and group learning. Additionally, it shows high verbal fluency, resulting in a positive human–computer interaction experience. GPT-4 significantly improves multiple capabilities and reduces hallucinations compared to the previous GPT-3.5 model, with a particular leap forward in the Chinese comprehension capability of medical tasks. Artificial intelligence (AI) systems face the challenges of hallucinations, legal risks, and ethical issues. However, we discovered ChatGPT's potential to promote medical development as an electronic health infrastructure, paving the way for Medical AI to become necessary.

health care sciences & services,computer science, information systems,medical informatics
Exploring the Potential of ChatGPT-4 for Clinical Decision Support in Cardiac Electrophysiology and Its Semi-Automatic Evaluation Metrics

Xiarepati Tieliwaerdi,Abulikemu Abuduweili,Saleh Saleh,Erasmus Mutabi,Michael A Rosenberg,Emerson Liu

DOI: https://doi.org/10.1101/2024.07.10.24310247

2024-07-12

Abstract:Background/Aim: Despite extensive research in other medical fields, the capabilities of ChatGPT-4 in clinical decision support within cardiac electrophysiology (EP) remain largely unexplored. This study aims to enhance ChatGPT- 4`s domain-specific expertise by employing the Retrieval-Augmented Generation (RAG) approach, which integrates up-to-date, evidence-based knowledge into ChatGPT-4`s foundational database. Additionally, we plan to explore the use of commonly used automatic evaluation metrics in natural language processing, such as BERTScore, BLEURT, and cosine similarity, alongside human evaluation, to develop a semi-automatic framework. This aims to reduce dependency on exhaustive human evaluations, addressing the need for efficient and scalable assessment tools in medical decision-making, given the rapid adoption of ChatGPT-4 by the public. Method: We analyzed five atrial fibrillation (Afib) cases and seven cardiac implantable electronic device (CIED) infection cases curated from PubMed case reports. We conducted a total of 120 experiments for Afib and 168 for CIED cases, testing each case across four temperature settings (0, 0.5, 1, 1.2) and three seed settings (1, 2, 3). ChatGPT-4`s performance was assessed under two modes: the Retrieval-Augmented Generation (RAG) mode and the Cold Turkey mode, which queries ChatGPT without external knowledge via RAG. For Afib cases, ChatGPT was asked to determine rate, rhythm, and anticoagulation options, and provide reasoning for each. For CIED cases, ChatGPT is asked to determine the presence of device infections. Accuracy metrics evaluated the determination component, while reasoning was assessed by human evaluation, BERTScore, BLEURT, and cosine similarity. A mixed effects analysis was used to compare the performance under both models across varying seeds and temperatures. Spearman`s rank correlation was used to explore the relationship between automatic metrics and human evaluation. Results: In this study, 120 experiments for Afib and 168 for CIED were conducted. There is no significant difference between the RAG mode and the Cold Turkey mode across various metrics including determination accuracy, reasoning similarity, and human evaluation scores, although RAG achieved higher cosine similarity scores in Afib cases (0.82 vs. 0.75) and better accuracy in CIED cases (0.70 vs. 0.66), though these differences were not statistically significant due to the small sample size. Our mixed effects analysis revealed no significant effects of temperature or method interactions, indicating stable performance across these variables. Moreover, while no individual evaluation metric, such as BERTScore, BLEURT or cosine similarity, showed a high correlation with human evaluations. However, the ACC-Sim metric, which averages accuracy and cosine similarity, exhibits the highest correlation with human evaluation, with Spearman`s ρ at 0.86 and a P value < 0.001, indicating a significant ordinal correlation between ACC-Sim and human evaluation. This suggests its potential as a surrogate for human evaluation in similar medical scenarios. Conclusion: Our study did not find a significant difference between the RAG and Cold Turkey methods in terms of ChatGPT-4`s clinical decision-making performance in Afib and CIED infection management. The ACC-Sim metric closely aligns with human evaluations in these specific medical contexts and shows promise for integration into a semi-automatic evaluation framework.

Enhancement of mucosal immune response against HIV-1 Gag by DNA immunization.

A context-based chatbot surpasses trained radiologists and generic ChatGPT in following the ACR appropriateness guidelines

Kaposi's sarcoma-associated herpesvirus-encoded LANA associates with glucocorticoid receptor and enhances its transcriptional activities.

ChatGPT v4 outperforming v3.5 on cancer treatment recommendations in quality, clinical guideline, and expert opinion concordance

Ultrastructure of cryopreserved horse embryos.

A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging

Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4

ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection

[Indications and results of splenectomy in intermediate thalassemia].

Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

Artificial Intelligence Chatbots' Understanding of the Risks and Benefits of Computed Tomography and Magnetic Resonance Imaging Scenarios

Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

Satisfaction with child care arrangements: effects on adaptation to parenthood.

ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives

Evaluating the performance and clinical decision‐making impact of ChatGPT‐4 in reproductive medicine

What is the most effective management of neurogenic bladder dysfunction?

Current applications and future potential of ChatGPT in radiology: A systematic review

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment

Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology

Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI

Exploring the Potential of ChatGPT-4 for Clinical Decision Support in Cardiac Electrophysiology and Its Semi-Automatic Evaluation Metrics