Retrieval-augmented large language models for clinical trial screening.

Ryan Tan,Si Xian Ho,Shiyun Vivianna Fequira Oo,Shi Ling Chua,Ma Wai Wai Zaw,Daniel Shao-Weng Tan
DOI: https://doi.org/10.1200/jco.2024.42.16_suppl.e13611
IF: 45.3
2024-05-31
Journal of Clinical Oncology
Abstract:e13611 Background: Clinical trial screening is currently manual and laborious. We tested several large language models enhanced with retrieval-augmented generation (RAG-LLM) to assess their performance on this task. Methods: We extracted eligibility criteria of 184 oncology trials with FDA approval notifications between 8 January 2020 and 18 January 2024, as well as information on cancer staging and performance status scoring for the RAG-LLM vector database. A medical oncologist and 2 senior clinical trial coordinators developed a test set of 975 synthetic patient profiles which included primary site, stage, prior therapy, tumor mutations and one additional clinical feature. Each profile was paired with one of the 184 trials and annotated for ground-truth eligibility and the reason for it. This process was repeated for another validation set of 240 longer and more challenging profiles paired with 8 ongoing trials. We developed RAG-LLMs with 4 leading LLMs (Zephyr-7B, Med42, GPT 3.5, GPT4) and evaluated their accuracy in determining trial eligibility as well as retrieval-augmented generation assessment (RAGAs) metrics. A response was deemed accurate only if it correctly assigned both trial eligibility and the reason for assignment. Results: GPT4 performed best and achieved an accuracy of 95.18% and 80.00% on the test and validation set respectively with a mean inference time of 10.95 seconds. It also demonstrated the highest answer relevancy, context relevancy and faithfulness (Table). Conclusions: Our results demonstrate potential for RAG-LLMs to assist with trial screening at scale. Further evaluation in real-world cohorts utilizing electronic records and full protocol data with tracking of impact on trial enrolment can be explored within secure firewalls. [Table: see text]
oncology
What problem does this paper attempt to address?