Can ChatGPT provide accurate financial toxicity resources to patients with cancer?

Jharna Mehul Patel,Erin Leigh Miller,Tal Cantor,Fumiko Chino,Bridgette Thom,Marina Stasenko,Emeline Mariam Aviki
DOI: https://doi.org/10.1200/jco.2024.42.16_suppl.e13638
IF: 45.3
2024-05-31
Journal of Clinical Oncology
Abstract:e13638 Background: Artificial Intelligence (AI) has potential benefits in medicine, including improving patient education, but its use in mitigating financial toxicity faced by patients with cancer is unknown. We aimed to investigate the accuracy of ChatGPT, an open access AI, responses to common financial concerns among patients with cancer. Methods: ChatGPT, an open access AI, was used to query 50 questions developed using oncologic society websites. These questions encompassed medical (n=25) and non-medical financial (n=25) inquiries related to breast, endometrial, lung, colon, and prostate cancers. The answers were scored by cancer financial toxicity experts with the scale: 1) correct and comprehensive, 2) correct but not comprehensive, 3) some correct, some incorrect components and 4) completely incorrect. Score discrepancies were resolved with a third reviewer. In secondary analysis, all questions were queried 3 times to assess if ChatGPT could provide more comprehensive answers on subsequent inquiries. The proportion of responses earning each score were calculated overall and within each question category and disease site. Results: Overall, in the first queries, 84% of questions were scored as correct but not comprehensive (n=42/50) and 6% (n=3/50) of questions were scored as having both correct and incorrect components. Approximately 28% (n=14/50) of questions had discrepant scores that required third party review, and 10% (n=5/50) of questions never achieved a consensus. All non-medical questions were scored as correct but not comprehensive (n=25/25) with a concordance rate of 96% (n=24/25), with only 1 question sent to a third reviewer. Additional queries of non-medical questions did not yield different answers. For the medical questions, approximately 68% (n=17/25) of questions were scored as correct but not comprehensive and 12% (n=3/25) were scored as having correct and incorrect components. 52% (n=13/25) of questions were sent to a third reviewer, and 20% (n=5/25) of questions never achieved a consensus score. While additional queries of medical questions yielded different answers, the quality of the responses did not improve. For each disease site, prostate cancer questions had the highest number of discordant questions with 40% (n=4/10) that were sent to a third reviewer for scoring but a 100% (n=10/10) of responses had a score that showed correct but not comprehensive answers. Reviewer feedback was universal that ChatGPT responses did not provide helpful information to patients, even in the situations in which there was technically no inaccurate information. Conclusions: ChatGPT does not consistently provide accurate or helpful answers to financial toxicity concerns commonly asked by patients with cancer. Additional input from experts is necessary to aid in AI learning before this resource should be used by patients facing financial toxicity.
oncology
What problem does this paper attempt to address?