Abstract:Rationale and Objectives: Over the past year, studies have been conducted to evaluate the performance of Large Language Models (LLMs), such as ChatGPT, in the fields of gynecologic oncology. This review aims to analyze the applications and risks associated with using LLMs in this specialized field. Materials and Methods: This systematic review was performed in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, incorporating elements from the diagnostic test accuracy extension and the CHARMS checklist for reviews of prediction models. A systematic literature search was executed on July 17, 2024, across PubMed, Web of Science, and Scopus databases. We focused on identifying original research that integrates LLMs with gynecologic oncology. We assessed the risk of bias using the adapted QUADAS-2 criteria. Results: Our search identified eight studies that met our criteria, focusing on healthcare education, clinical practice, and medical code generation. These studies revealed variability in ChatGPT's performance across different applications. It excelled in genetic testing and counseling, achieving 97% accuracy rate. However, its performance in cervical cancer prevention was less robust, with an accuracy of 83%. While one study demonstrated ChatGPT's high adherence to quality guidelines, another noted that established guidelines significantly outperformed ChatGPT's outputs. Additionally, code generation using tools like Google Bard and RoBERTa have shown potential to improve accuracy in clinical predictions and quality assurance. For example, Natural Language Processing (NLP) assisted by RoBERTa (based on Google's BERT model) has improved the prediction of residual disease in women with advanced epithelial ovarian cancer following cytoreductive surgery. Despite these advancements, challenges related to consistency, specificity, and personalization persist, underscoring the necessity for continuous enhancement of these technologies. Conclusion: LLMs demonstrate inconsistent performance in gynecologic oncology. These findings emphasize the need for continuous evaluation of these models before they are implemented clinically.

Large language models to facilitate pregnancy prediction after in vitro fertilization

Efficacy of large language models and their potential in Obstetrics and Gynecology education

Large Language Models forecast Patient Health Trajectories enabling Digital Twins

A Study of Generative Large Language Model for Medical Research and Healthcare

Based on Medicine, The Now and Future of Large Language Models

Using feature optimization and LightGBM algorithm to predict the clinical pregnancy outcomes after in vitro fertilization

Can large language models help predict results from a complex behavioural science study?

Machine learning predicts live-birth occurrence before in-vitro fertilization treatment

Artificial intelligence model to predict pregnancy and multiple pregnancy risk following in vitro fertilization-embryo transfer (IVF-ET)

Clinical data-based modeling of IVF live birth outcome and its application

Leveraging Large Language Models in Gynecologic Oncology: A Systematic Review of Current Applications and Challenges

Live-Birth Prediction of Natural-Cycle In Vitro Fertilization Using 57,558 Linked Cycle Records: A Machine Learning Perspective

Is larger always better? Evaluating and prompting large language models for non-generative medical tasks

Multifactor Prediction of Embryo Transfer Outcomes Based on a Machine Learning Algorithm

An Evaluation of Large Language Models in Bioinformatics Research

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation

Applications of Large Language Models (LLMs) in Breast Cancer Care

Predicting Learning Performance with Large Language Models: A Study in Adult Literacy

Comparative study of machine learning approaches integrated with genetic algorithm for IVF success prediction

Predicting Lung Cancer Patient Prognosis with Large Language Models