Evaluating the accuracy of large language model (ChatGPT) in providing information on metastatic breast cancer

Satya Sesha Sai Kiran S Pindiprolu,Sathis Kumar D,Nagasen Dasari,Ramakrishna Gummadi
DOI: https://doi.org/10.34172/apb.2024.060
2024-07-31
Advanced Pharmaceutical Bulletin
Abstract:Background: Artificial intelligence (AI), particularly large language models like ChatGPT developed by OpenAI, has shown promise in transforming various domains, including medicine. ChatGPT's ability to generate human-like responses to a wide array of subjects has been leveraged in various fields, notably passing the rigorous United States Medical Licensing Examination (USMLE) Step 1. However, its proficiency in responding to inquiries related to breast cancer, a prevalent and complex disease spectrum, remains underexplored. Objective: This study aims to evaluate the accuracy and comprehensiveness of ChatGPT's responses to commonly asked questions about breast cancer, filling a critical gap in the literature and understanding its potential in enhancing patient education and support within the realm of breast cancer management. Methods: A list of 100 questions was curated from frequently asked questions on Cancer.net and the National Breast Cancer Foundation's website, combined with inquiries commonly received in clinical practice. These questions were entered into the ChatGPT (Version 4.0) user interface, and the responses were evaluated by two primary experts for accuracy, using a four-point scale. Discrepancies in scoring were resolved by additional expert reviews. Results: Out of the 100 evaluated responses, 5 were entirely inaccurate, 22 partially accurate, 42 accurate but not comprehensive, and 31 highly accurate. The majority of responses across different categories were found to be at least accurate to a certain extent, demonstrating ChatGPT's potential in providing reliable information on breast cancer. Conclusions: ChatGPT exhibits potential as a supplementary resource for patient education on breast cancer. While its responses were generally accurate, the presence of inaccuracies highlights the importance of professional oversight.
What problem does this paper attempt to address?