Abstract:Objective The use of electronic health records (EHRs) holds promising potential to enhance clinical trial activities. However, the identification of eligible patients within EHRs presents considerable challenges. Our objective was to develop an eligibility criteria phenotyping pipeline that would identify patients with matching clinical characteristics from EHRs. Material and methods In this study, we utilized clinical trial eligibility criteria from clinicaltrial.gov and patients EHR datasets from the Sema4 data warehouse, which include multiple heath provider datasets. To ensure computability and queryability, the eligibility criteria attributes and clinical characteristics in EHRs were normalized using four national standard terminologies, LIONC, ICD-9-CM, ICD-10-CM, and CPT, along with four in-house knowledge bases containing procedures, medications, biomarkers, and diagnosis modifiers. The process involved a semi-automated approach incorporating rule-based, pattern recognition, and manual annotation methods. The quality of machine-normalized criteria attributes was accessed using Cohens Kappa coefficient on randomly selected criteria, and the accuracy of our matching between normalized criteria and patient clinical characteristics was evaluated using precision, recall, and F1 score on randomly selected patients. Results A total of 640 unique eligibility criteria attributes were identified, covering various medical conditions, including five types of cancer (non-small cell lung cancer, small cell lung cancer, prostate cancer, breast cancer, and multiple myeloma), two autoimmune diseases (ulcerative colitis and Crohns disease), one metabolic disorder (non-alcoholic steatohepatitis), and a rare disease (sickle cell anemia). Among these attributes, 367 eligibility criteria attributes were normalized. 174 were encoded with standard terminologies and 193 were normalized using the in-house reference tables. The agreement between automated and manually annotated normalized codes was found to be 0.82 and matching between eligibility criteria attribute and patient clinical information achieved a high F1-score of 0.94. Conclusion We established a clinical phenotyping pipeline facilitating effective communication between the eligibility criteria and EHR. The pipeline demonstrated its generalizability by being applied to EHR data from different institutes. Our pipeline shows the potential to significantly enhance the utilization of EHRs in clinical trial activities and improve patient matching and selection processes, thereby advancing clinical research and patient outcomes.

Parsing Clinical Trial Eligibility Criteria for Cohort Query by a Multi-Input Multi-Output Sequence Labeling Model

Transformer-based Named Entity Recognition for Parsing Clinical Trial Eligibility Criteria

An Ensemble Learning Strategy for Eligibility Criteria Text Classification for Clinical Trial Recruitment: Algorithm Development and Validation (Preprint)

An Ensemble Learning Strategy for Eligibility Criteria Text Classification for Clinical Trial Recruitment: Algorithm Development and Validation

Machine learning and natural language processing in clinical trial eligibility criteria parsing: a scoping review

Analysis of Eligibility Criteria Clusters Based on Large Language Models for Clinical Trial Design

Towards Formal Computable Representation of Clinical Trial Eligibility Criteria for Alzheimer's Disease

Attention-Based LSTM Network for COVID-19 Clinical Trial Parsing

[Artificial intelligence based Chinese clinical trials eligibility criteria classification]

Criteria2Query: a natural language interface to clinical databases for cohort definition

Text Classification of Cancer Clinical Trial Eligibility Criteria

EliXR: an approach to eligibility criteria extraction and representation

Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model

AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models

Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

CriteriaMapper: establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics

Automated Matching of Patients to Clinical Trials: A Patient-Centric Natural Language Processing Approach for Pediatric Leukemia

Utilizing Large Language Models for Enhanced Clinical Trial Matching: A Study on Automation in Patient Screening

A query interface for clinical research with Chinese electronic health record using Natural Language Processing

Establishing the Automatic Identification of Clinical Trial Cohorts from Electronic Health Records by Matching Normalized Eligibility Criteria and Patient Clinical Characteristics