Panacea: A foundation model for clinical trial search, summarization, design, and recruitment

Jiacheng Lin,Hanwen Xu,Zifeng Wang,Sheng Wang,Jimeng Sun
DOI: https://doi.org/10.1101/2024.06.26.24309548
2024-06-27
Abstract:Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challenge, we propose a clinical trial foundation model named Panacea, designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching. We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers, to infuse clinical knowledge into the model by pre-training. We further curate TrialInstruct, which has 200,866 of instruction data for fine-tuning. These resources enable Panacea to be widely applicable for a range of clinical trial tasks based on user requirements. We evaluated Panacea on a new benchmark, named TrialPanorama, which covers eight clinical trial tasks. Our method performed the best on seven of the eight tasks compared to six cutting-edge generic or medicine-specific LLMs. Specifically, Panacea showed great potential to collaborate with human experts in crafting the design of eligibility criteria, study arms, and outcome measures, in multi-round conversations. In addition, Panacea achieved 14.42% improvement in patient-trial matching, 41.78% to 52.02% improvement in trial search, and consistently ranked at the top for five aspects of trial summarization. Our approach demonstrates the effectiveness of Panacea in clinical trials and establishes a comprehensive resource, including training data, model, and benchmark, for developing clinical trial foundation models, paving the path for AI-based clinical trial development.
Public and Global Health
What problem does this paper attempt to address?
This paper proposes a clinical trial framework called Panacea, aiming to address multiple tasks in clinical trials including trial search, trial summary, trial design, and patient-trial matching. Existing models often focus on specific tasks and are not adaptable to diverse needs. Panacea incorporates clinical knowledge by pretraining on a large-scale dataset called TrialAlign, which includes 793,279 trial documents and 1,113,207 related scientific papers. It further fine-tunes using TrialInstruct to understand user interpretations of task definitions and output requirements. In the paper, Panacea is compared with six other advanced language models on a newly established clinical trial benchmark called TrialPanorama, achieving the best performance in seven out of eight tasks, particularly excelling in patient-trial matching, trial search, and trial summary. Additionally, Panacea's resources, including training data, models, and benchmarks, provide comprehensive support for the development of clinical trial frameworks, thus driving the advancement of AI-based clinical trials.