Leveraging Large Language Models in Gynecologic Oncology: A Systematic Review of Current Applications and Challenges
Aya Mudrik,Abraham Tsur,Girish Nadkarni,Orly Efros,Benjamin S Glicksberg,Shelly Soffer,Eyal Klang
DOI: https://doi.org/10.1101/2024.08.08.24311699
2024-08-09
Abstract:Rationale and Objectives: Over the past year, studies have been conducted to evaluate the performance of Large Language Models (LLMs), such as ChatGPT, in the fields of gynecologic oncology. This review aims to analyze the applications and risks associated with using LLMs in this specialized field.
Materials and Methods: This systematic review was performed in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, incorporating elements from the diagnostic test accuracy extension and the CHARMS checklist for reviews of prediction models. A systematic literature search was executed on July 17, 2024, across PubMed, Web of Science, and Scopus databases. We focused on identifying original research that integrates LLMs with gynecologic oncology. We assessed the risk of bias using the adapted QUADAS-2 criteria.
Results: Our search identified eight studies that met our criteria, focusing on healthcare education, clinical practice, and medical code generation. These studies revealed variability in ChatGPT's performance across different applications. It excelled in genetic testing and counseling, achieving 97% accuracy rate. However, its performance in cervical cancer prevention was less robust, with an accuracy of 83%. While one study demonstrated ChatGPT's high adherence to quality guidelines, another noted that established guidelines significantly outperformed ChatGPT's outputs. Additionally, code generation using tools like Google Bard and RoBERTa have shown potential to improve accuracy in clinical predictions and quality assurance. For example, Natural Language Processing (NLP) assisted by RoBERTa (based on Google's BERT model) has improved the prediction of residual disease in women with advanced epithelial ovarian cancer following cytoreductive surgery. Despite these advancements, challenges related to consistency, specificity, and personalization persist, underscoring the necessity for continuous enhancement of these technologies.
Conclusion: LLMs demonstrate inconsistent performance in gynecologic oncology. These findings emphasize the need for continuous evaluation of these models before they are implemented clinically.
Obstetrics and Gynecology
What problem does this paper attempt to address?
This paper aims to analyze the application of large - language models (LLMs) in gynecologic oncology and their associated risks. Specifically, the researchers hope to evaluate the performance of these models in medical education, clinical practice, and medical code generation through a systematic review of current research, and explore their potential and limitations in this specialized field.
### Problems the paper attempts to solve:
1. **Evaluate the application of LLMs in gynecologic oncology**: Researchers want to understand how LLMs are used in different aspects of gynecologic oncology, such as medical education, clinical practice, and medical code generation.
2. **Analyze the performance of LLMs in gynecologic oncology**: Through a systematic review of existing research, evaluate the performance of these models in different application scenarios, including accuracy, consistency, and the ability to personalize.
3. **Identify the challenges of LLMs in gynecologic oncology**: Explore the problems that these models may encounter in practical applications, such as data privacy, model interpretability, and integration with existing medical systems.
4. **Propose directions for future research**: Based on the findings of current research, point out the key areas for future research to further improve the application effect of LLMs in gynecologic oncology.
### Main research methods:
- **Systematic review**: Follow the PRISMA guidelines to conduct a systematic literature search in the PubMed, Web of Science, and Scopus databases, and screen out original studies that meet the criteria.
- **Risk assessment**: Use the adapted QUADAS - 2 criteria to assess the risk of bias in the included studies.
- **Data synthesis**: Due to the heterogeneity of research designs and results, use the method of narrative synthesis rather than meta - analysis to summarize the diverse applications, advantages, and challenges of LLMs in gynecologic oncology.
### Main findings:
- **Medical education**: LLMs show high accuracy (97%) when answering genetic testing and counseling questions, but perform poorly (83%) on cervical cancer prevention questions.
- **Clinical practice**: The performance of LLMs in providing treatment recommendations is uneven and sometimes lacks personalized assessment.
- **Medical code generation**: LLMs show potential in generating medical codes, can improve the efficiency of quality assurance audits, and are superior to traditional methods in predicting residual diseases.
### Conclusion:
The performance of LLMs in gynecologic oncology is inconsistent, with significant advantages and obvious limitations. Therefore, the researchers emphasize the need for continuous evaluation and improvement of these models to ensure their reliability and effectiveness in clinical applications.