End-To-End Clinical Trial Matching with Large Language Models

Dyke Ferber,Lars Hilgers,Isabella C. Wiest,Marie-Elisabeth Leßmann,Jan Clusmann,Peter Neidlinger,Jiefu Zhu,Georg Wölflein,Jacqueline Lammert,Maximilian Tschochohei,Heiko Böhme,Dirk Jäger,Mihaela Aldea,Daniel Truhn,Christiane Höper,Jakob Nikolas Kather
2024-07-18
Abstract:Matching cancer patients to clinical trials is essential for advancing treatment and patient care. However, the inconsistent format of medical free text documents and complex trial eligibility criteria make this process extremely challenging and time-consuming for physicians. We investigated whether the entire trial matching process - from identifying relevant trials among 105,600 oncology-related clinical trials on <a class="link-external link-http" href="http://clinicaltrials.gov" rel="external noopener nofollow">this http URL</a> to generating criterion-level eligibility matches - could be automated using Large Language Models (LLMs). Using GPT-4o and a set of 51 synthetic Electronic Health Records (EHRs), we demonstrate that our approach identifies relevant candidate trials in 93.3% of cases and achieves a preliminary accuracy of 88.0% when matching patient-level information at the criterion level against a baseline defined by human experts. Utilizing LLM feedback reveals that 39.3% criteria that were initially considered incorrect are either ambiguous or inaccurately annotated, leading to a total model accuracy of 92.7% after refining our human baseline. In summary, we present an end-to-end pipeline for clinical trial matching using LLMs, demonstrating high precision in screening and matching trials to individual patients, even outperforming the performance of qualified medical doctors. Our fully end-to-end pipeline can operate autonomously or with human supervision and is not restricted to oncology, offering a scalable solution for enhancing patient-trial matching in real-world settings.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the efficiency and accuracy of clinical trial matching for cancer patients. Specifically: 1. **Background**: - Identifying clinical trials suitable for cancer patients is crucial for advancing treatment methods and improving patient care. - However, due to the inconsistent format of free - text medical documents and the complex logic of trial eligibility criteria, this process is not only extremely challenging for doctors, but also time - consuming and error - prone. - This has led to the under - enrollment of cancer patients in clinical trials, especially in a timely manner. 2. **Objectives**: - Utilize large language models (LLMs) to automate and optimize the clinical trial matching process. - Improve the matching rate of patients with suitable clinical trials by reducing the time and errors of manual screening. - Provide an end - to - end solution that can efficiently screen suitable clinical trials on a large scale and precisely match them with individual patients. 3. **Methods**: - Generate 51 real oncology patient electronic health records (EHRs). - Use GPT - 4o to screen out suitable trial candidates from 105,600 tumor - related clinical trials worldwide. - Screen the patient's eligibility criteria one by one in the selected trial pool through LLM and compare with the baseline results of human experts. - Use AI feedback to iteratively process the differences between AI and human results and adjust the human baseline definition when necessary. 4. **Results**: - This method successfully identified relevant, human - pre - selected candidate trials in 93.3% of the test cases among all trials worldwide. - In the case of using the initial human evaluation as the baseline, the preliminary accuracy rate of matching patient information according to each criterion was 88.0% (1,398/1,589). - By re - evaluating the human scores using LLM feedback, it was found that 39.3% of the criteria that were initially considered wrong were actually ambiguous or mis - labeled by humans, and the final model accuracy reached 92.7%. 5. **Conclusions**: - This study demonstrates an end - to - end clinical trial matching pipeline using LLMs, which can exhibit high precision when screening suitable clinical trials on a large scale and even outperform qualified doctors in matching selected candidate trials with individual patients. - This pipeline can operate completely autonomously or under human supervision and is not limited to cancer, providing a scalable solution for patient - trial matching in the real world. Through these methods and results, the paper aims to solve the problems of inefficiency and insufficient accuracy in the current clinical trial matching process.