Abstract:Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore their application to trial matching. First, we design an LLM-based system which, given a patient's medical history as unstructured clinical text, evaluates whether that patient meets a set of inclusion criteria (also specified as free text). Our zero-shot system achieves state-of-the-art scores on the n2c2 2018 cohort selection benchmark. Second, we improve the data and cost efficiency of our method by identifying a prompting strategy which matches patients an order of magnitude faster and more cheaply than the status quo, and develop a two-stage retrieval pipeline that reduces the number of tokens processed by up to a third while retaining high performance. Third, we evaluate the interpretability of our system by having clinicians evaluate the natural language justifications generated by the LLM for each eligibility decision, and show that it can output coherent explanations for 97% of its correct decisions and 75% of its incorrect ones. Our results establish the feasibility of using LLMs to accelerate clinical trial operations.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to achieve automatic matching of clinical trial patients through large language models (LLMs), thereby accelerating the process of bringing new drugs to the market**. Specifically, the paper mainly focuses on the following aspects: 1. **Manual patient screening is time - consuming and inefficient**: Currently, identifying patients who meet the eligibility criteria for clinical trials is a highly manual process, and it may take up to 1 hour for each patient. This not only consumes a great deal of time and resources but also limits the progress of clinical trials and the speed of new drug development. 2. **Challenges in automated screening**: The main difficulty in automated screening lies in understanding and processing unstructured clinical texts. Traditional natural language processing (NLP) methods have limited effectiveness in dealing with such texts because these texts contain a large amount of free - format information, such as progress notes, emails, radiology reports, and genetic test results. 3. **Application of zero - shot learning**: To solve the above problems, the authors propose a zero - shot learning system based on large language models (LLMs), which can directly evaluate whether a patient meets the inclusion criteria of a clinical trial without any fine - tuning or with only a few examples provided. The advantage of this method is that it does not require labeled data, can quickly adapt to any new clinical trial, and significantly reduces costs. ### Main contributions of the paper 1. **Zero - shot performance evaluation**: The ability of LLMs to evaluate patient eligibility in a zero - shot setting was studied, and it was found that GPT - 4 achieved state - of - the - art performance in the 2018 n2c2 cohort selection benchmark test. 2. **Improving data and cost efficiency**: By designing a two - stage retrieval pipeline, the number of tokens processed is reduced while maintaining high precision. This enables the system to be more efficient when processing large - scale electronic health records (EHRs). 3. **Interpretability verification**: By having clinicians evaluate the natural - language explanations generated by the LLM, it was proven that the system can provide coherent reasons for 97% of correct decisions and 75% of incorrect decisions, enhancing the credibility of the system. ### Method overview The authors designed a zero - shot system that uses large language models to evaluate whether a patient's clinical notes meet specific inclusion criteria. To improve efficiency, they also introduced a two - stage retrieval pipeline, which first screens out the most relevant fragments from the patient's notes and then inputs these fragments into the evaluation model. Experimental results show that this method not only improves performance but also significantly reduces computational costs. ### Conclusion The paper demonstrates the great potential of large language models in accelerating patient matching in clinical trials. Through zero - shot learning and an efficient retrieval pipeline, the system can quickly and accurately screen out eligible patients without the need for a large amount of labeled data, thus providing strong support for new drug development.

Zero-Shot Clinical Trial Patient Matching with LLMs

Learning to match patients to clinical trials using large language models

End-To-End Clinical Trial Matching with Large Language Models

Utilizing Large Language Models for Enhanced Clinical Trial Matching: A Study on Automation in Patient Screening

Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models

PRISM: Patient Records Interpretation for Semantic clinical trial Matching system using large language models

Distilling Large Language Models for Matching Patients to Clinical Trials

LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability

Enhancing Biomarker-Based Oncology Trial Matching Using Large Language Models

Automated Matching of Patients to Clinical Trials: A Patient-Centric Natural Language Processing Approach for Pediatric Leukemia

Matching Patients to Clinical Trials with Large Language Models

Transforming clinical trials: the emerging roles of large language models

Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching

Application of a general LLM-based classification system to retrieve information about oncological trials

Novel Development of LLM Driven mCODE Data Model for Improved Clinical Trial Matching to Enable Standardization and Interoperability in Oncology Research

Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation

Controlled LLM-based Reasoning for Clinical Trial Retrieval

Large Language Model Augmented Clinical Trial Screening

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records