Results of the COMPARE-GPT study: Comparison of medication package inserts and GPT-4 cancer drug information.

Jasmin Hundal,Eleonora Teplinsky
DOI: https://doi.org/10.1200/jco.2024.42.16_suppl.e13646
IF: 45.3
2024-05-31
Journal of Clinical Oncology
Abstract:e13646 Background: The integration of artificial intelligence (AI) technologies has opened new avenues for information dissemination. GPT-4 (GPT) is a large language model developed by open AI. Concerns about its accuracy exist, especially in health communications. We conducted a study to evaluate the accuracy of GPT responses to questions about anti-neoplastic agents for solid tumor malignancies. Methods: We evaluated GPT responses to four questions (Table 1) for each solid tumor drug newly approved or approved for a new indication between 2020 and 2022. The responses from GPT were saved and then compared to the information provided in the medication package insert. For discordant results, a second search was run using a new GPT session. The study was performed by two physicians who ensured that the comparison was both rigorous and unbiased. Results: 53 unique antineoplastic agents were included. GPT responses regarding FDA approval & mechanism of action were correct for each drug (100%). When asked about common adverse reactions, GPT provided incorrect responses for 47% (25/53) of the medications and correct responses for 53% (28/43). The inaccurate responses were either missing certain side effects (88%), provided inaccurate incidence rates (8%), and/or included side effects not listed on the package insert (20%). When a second search was conducted with the same question, GPT provided different responses in 76% (19/25), same responses in 16% (4/25) and now correct responses in 8% (2/25). When asked about drug warnings and precautions, GPT provided incorrect responses for 68% (36/53) of the medications and correct responses for 32% (17/53). The inaccurate responses were missing certain warnings and precautions (89%) and/or included warnings and precautions not listed on the package insert (25%). When a second search was conducted with the same question, GPT provided different responses in 53% (19/36), same responses in 39% (14/36) and now correct responses in 8% (3/36). Conclusions: Analysis of solid tumor drugs approved for use in 2020-2022 evaluated GPT's ability to provide drug information. GPT accurately identified FDA approved indications & mechanisms of action but had significant limitations in accurately reporting a comprehensive list of adverse reactions, drug warnings and precautions. GPT occasionally included drug side effects that are not listed in the package insert. Significant variability in output was demonstrated in repeated searches. Given the inconsistencies in the data identified, GPT should not be used as a primary source of medical information and may potentially cause harm. [Table: see text]
oncology
What problem does this paper attempt to address?