Matching Patients to Clinical Trials with Large Language Models

Qiao Jin,Zifeng Wang,Charalampos S. Floudas,Fangyuan Chen,Changlin Gong,Dara Bracken-Clarke,Elisabetta Xue,Yifan Yang,Jimeng Sun,Zhiyong Lu
2024-04-28
Abstract:Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient's eligibility on a criterion-by-criterion basis and then consolidates these predictions to assess the patient's eligibility for the target trial. We evaluate the trial-level prediction performance of TrialGPT on three publicly available cohorts of 184 patients with over 18,000 trial annotations. We also engaged three physicians to label over 1,000 patient-criterion pairs to assess its criterion-level prediction accuracy. Experimental results show that TrialGPT achieves a criterion-level accuracy of 87.3% with faithful explanations, close to the expert performance (88.7%-90.0%). The aggregated TrialGPT scores are highly correlated with human eligibility judgments, and they outperform the best-competing models by 32.6% to 57.2% in ranking and excluding clinical trials. Furthermore, our user study reveals that TrialGPT can significantly reduce the screening time (by 42.6%) in a real-life clinical trial matching task. These results and analyses have demonstrated promising opportunities for clinical trial matching with LLMs such as TrialGPT.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper presents a solution to the problem of patient matching in clinical trials. In the study, the authors developed a large-scale language model framework called TrialGPT, which assists in matching patients with suitable clinical trials. TrialGPT analyzes patient medical records and predicts, line by line, whether the patient meets the trial criteria, and integrates these predictions to evaluate the overall eligibility of the patient for the target trial. Tested on three publicly available datasets, including 184 patients and over 18,000 trial annotations, TrialGPT's trial-level predictive performance was validated, with a standard prediction accuracy close to expert level (87.3%). Additionally, it outperformed other models in excluding and ranking clinical trials, reducing screening time by 42.6%. This work showcases the potential of large-scale language models like TrialGPT in clinical trial matching, contributing to improved efficiency and accuracy.