Leveraging Large Language Models in Gynecologic Oncology: A Systematic Review of Current Applications and Challenges

Aya Mudrik,Abraham Tsur,Girish Nadkarni,Orly Efros,Benjamin S Glicksberg,Shelly Soffer,Eyal Klang

DOI: https://doi.org/10.1101/2024.08.08.24311699

2024-08-09

Abstract:Rationale and Objectives: Over the past year, studies have been conducted to evaluate the performance of Large Language Models (LLMs), such as ChatGPT, in the fields of gynecologic oncology. This review aims to analyze the applications and risks associated with using LLMs in this specialized field. Materials and Methods: This systematic review was performed in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, incorporating elements from the diagnostic test accuracy extension and the CHARMS checklist for reviews of prediction models. A systematic literature search was executed on July 17, 2024, across PubMed, Web of Science, and Scopus databases. We focused on identifying original research that integrates LLMs with gynecologic oncology. We assessed the risk of bias using the adapted QUADAS-2 criteria. Results: Our search identified eight studies that met our criteria, focusing on healthcare education, clinical practice, and medical code generation. These studies revealed variability in ChatGPT's performance across different applications. It excelled in genetic testing and counseling, achieving 97% accuracy rate. However, its performance in cervical cancer prevention was less robust, with an accuracy of 83%. While one study demonstrated ChatGPT's high adherence to quality guidelines, another noted that established guidelines significantly outperformed ChatGPT's outputs. Additionally, code generation using tools like Google Bard and RoBERTa have shown potential to improve accuracy in clinical predictions and quality assurance. For example, Natural Language Processing (NLP) assisted by RoBERTa (based on Google's BERT model) has improved the prediction of residual disease in women with advanced epithelial ovarian cancer following cytoreductive surgery. Despite these advancements, challenges related to consistency, specificity, and personalization persist, underscoring the necessity for continuous enhancement of these technologies. Conclusion: LLMs demonstrate inconsistent performance in gynecologic oncology. These findings emphasize the need for continuous evaluation of these models before they are implemented clinically.

Obstetrics and Gynecology

What problem does this paper attempt to address?

This paper aims to analyze the application of large - language models (LLMs) in gynecologic oncology and their associated risks. Specifically, the researchers hope to evaluate the performance of these models in medical education, clinical practice, and medical code generation through a systematic review of current research, and explore their potential and limitations in this specialized field. ### Problems the paper attempts to solve: 1. **Evaluate the application of LLMs in gynecologic oncology**: Researchers want to understand how LLMs are used in different aspects of gynecologic oncology, such as medical education, clinical practice, and medical code generation. 2. **Analyze the performance of LLMs in gynecologic oncology**: Through a systematic review of existing research, evaluate the performance of these models in different application scenarios, including accuracy, consistency, and the ability to personalize. 3. **Identify the challenges of LLMs in gynecologic oncology**: Explore the problems that these models may encounter in practical applications, such as data privacy, model interpretability, and integration with existing medical systems. 4. **Propose directions for future research**: Based on the findings of current research, point out the key areas for future research to further improve the application effect of LLMs in gynecologic oncology. ### Main research methods: - **Systematic review**: Follow the PRISMA guidelines to conduct a systematic literature search in the PubMed, Web of Science, and Scopus databases, and screen out original studies that meet the criteria. - **Risk assessment**: Use the adapted QUADAS - 2 criteria to assess the risk of bias in the included studies. - **Data synthesis**: Due to the heterogeneity of research designs and results, use the method of narrative synthesis rather than meta - analysis to summarize the diverse applications, advantages, and challenges of LLMs in gynecologic oncology. ### Main findings: - **Medical education**: LLMs show high accuracy (97%) when answering genetic testing and counseling questions, but perform poorly (83%) on cervical cancer prevention questions. - **Clinical practice**: The performance of LLMs in providing treatment recommendations is uneven and sometimes lacks personalized assessment. - **Medical code generation**: LLMs show potential in generating medical codes, can improve the efficiency of quality assurance audits, and are superior to traditional methods in predicting residual diseases. ### Conclusion: The performance of LLMs in gynecologic oncology is inconsistent, with significant advantages and obvious limitations. Therefore, the researchers emphasize the need for continuous evaluation and improvement of these models to ensure their reliability and effectiveness in clinical applications.

Leveraging Large Language Models in Gynecologic Oncology: A Systematic Review of Current Applications and Challenges

Applications of Large Language Models (LLMs) in Breast Cancer Care

Utilizing large language models in breast cancer management: systematic review

Exploring the role of artificial intelligence, large language models: Comparing patient‐focused information and clinical decision support capabilities to the gynecologic oncology guidelines

Applications of large language models in cancer care: current evidence and future perspectives

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Large language model use in clinical oncology

Leveraging Large Language Models for Decision Support in Personalized Oncology

Exploring the role of Large Language Models in Melanoma: a Systemic Review

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Comparative Evaluation of LLMs in Clinical Oncology

The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs)

Systematic review: The use of large language models as medical chatbots in digestive diseases

Performance of a trained large language model to provide clinical trial recommendation in a head and neck cancer population.

Exploring the role of Large Language Models in haematology: A focused review of applications, benefits and limitations

Large language model (ChatGPT) as a support tool for breast tumor board

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review

Large language models: a primer and gastroenterology applications

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review

Review of emerging trends and projection of future developments in large language models research in ophthalmology