SEETrials: Leveraging Large Language Models for Safety and Efficacy Extraction in Oncology Clinical Trials

Kyeryoung Lee,Hunki Paek,Liang-Chin Huang,C. Beau Hilton,Surabhi Datta,Josh Higashi,Nneka Ofoegbu,Jingqi Wang,Samuel M Rubinstein,Andrew J Cowan,Mary Kwok,Jeremy L Warner,Hua Xu,Xiaoyan Wang
DOI: https://doi.org/10.1101/2024.01.18.24301502
2024-05-13
Abstract:Background Initial insights into oncology clinical trial outcomes are often gleaned manually from conference abstracts. We aimed to develop an automated system to extract safety and efficacy information from study abstracts with high precision and fine granularity, transforming them into computable data for timely clinical decision-making. Methods We collected clinical trial abstracts from key conferences and PubMed (2012-2023). The SEETrials system was developed with four modules: preprocessing, prompt modeling, knowledge ingestion and postprocessing. We evaluated the system performance qualitatively and quantitatively and assessed its generalizability across different cancer types such as multiple myeloma (MM), breast, lung, lymphoma, and leukemia. Furthermore, the efficacy and safety of innovative therapies, including CAR-T, bispecific antibodies, and antibody-drug conjugates (ADC), in MM were analyzed across a large scale of clinical trial studies. Results SEETrials achieved high precision (0.958), recall (sensitivity) (0.944), and F1 score (0.951) across 70 data elements present in the MM trial studies Generalizability tests on four additional cancers yielded precision, recall, and F1 scores within the 0.966-0.986 range. Variation in the distribution of safety and efficacy-related entities was observed across diverse therapies, with certain adverse events more common in specific treatments. Comparative performance analysis using overall response rate (ORR) and complete response (CR) highlighted differences among therapies: CAR-T (ORR: 88%, 95% CI: 84-92%; CR: 95%, 95% CI: 53-66%), bispecific antibodies (ORR: 64%, 95% CI: 55-73%; CR: 27%, 95% CI: 16-37%), and ADC (ORR: 51%, 95% CI: 37-65%; CR: 26%, 95% CI: 1-51%). Notable study heterogeneity was identified (>75% I2 heterogeneity index scores) across several outcome entities analyzed within therapy subgroups. Conclusion SEETrials demonstrated highly accurate data extraction and versatility across different therapeutics and various cancer domains. Its automated processing of large datasets facilitates nuanced data comparisons, promoting the swift and effective dissemination of clinical insights.
What problem does this paper attempt to address?
The problem addressed in this paper is how to automatically extract safety and efficacy information from abstracts of tumor clinical trials in an efficient and detailed manner, and transform it into computable data that can be used for timely clinical decision-making. To solve this problem, the researchers developed a system called SEETrials, which consists of four modules: pre-processing, hint modeling, knowledge intake, and post-processing. SEETrials utilizes large-scale language models, such as GPT-4, to process clinical trial abstract data collected from key conferences and PubMed. By analyzing clinical trial data from various cancers such as multiple myeloma (MM), breast cancer, lung cancer, lymphoma, and leukemia, the system demonstrates high accuracy and generalizability. It is able to automatically process large amounts of data, facilitate nuanced comparisons of clinical insights, and thus accelerate and improve the efficiency of clinical decision-making. The paper quantitatively and qualitatively evaluates the accuracy of SEETrials in extracting clinical trial results and shows its good adaptability across different therapies and cancer types. The study also reveals differences in safety and efficacy among different therapies, such as CAR-T cell therapy, bispecific antibodies, and antibody-drug conjugates. For example, CAR-T therapy shows outstanding performance in terms of complete remission rate, but certain side effects are more common in specific treatments. In summary, this paper aims to address the challenge of automatically extracting safety and efficacy information from tumor clinical trial literature to support faster and more effective clinical decision-making.