Establishing the Automatic Identification of Clinical Trial Cohorts from Electronic Health Records by Matching Normalized Eligibility Criteria and Patient Clinical Characteristics
Kyeryoung Lee,Yun Mai,Zongzhi Liu,Kalpana Raja,Meng Ma,Tongyu Wang,Lei Ai,Ediz Calay,William Oh,Eric Schadt,xiaoyan wang
DOI: https://doi.org/10.1101/2024.02.28.24303396
2024-07-05
Abstract:Objective
The use of electronic health records (EHRs) holds promising potential to enhance clinical trial activities. However, the identification of eligible patients within EHRs presents considerable challenges. Our objective was to develop an eligibility criteria phenotyping pipeline that would identify patients with matching clinical characteristics from EHRs.
Material and methods
In this study, we utilized clinical trial eligibility criteria from clinicaltrial.gov and patients EHR datasets from the Sema4 data warehouse, which include multiple heath provider datasets. To ensure computability and queryability, the eligibility criteria attributes and clinical characteristics in EHRs were normalized using four national standard terminologies, LIONC, ICD-9-CM, ICD-10-CM, and CPT, along with four in-house knowledge bases containing procedures, medications, biomarkers, and diagnosis modifiers. The process involved a semi-automated approach incorporating rule-based, pattern recognition, and manual annotation methods. The quality of machine-normalized criteria attributes was accessed using Cohens Kappa coefficient on randomly selected criteria, and the accuracy of our matching between normalized criteria and patient clinical characteristics was evaluated using precision, recall, and F1 score on randomly selected patients.
Results
A total of 640 unique eligibility criteria attributes were identified, covering various medical conditions, including five types of cancer (non-small cell lung cancer, small cell lung cancer, prostate cancer, breast cancer, and multiple myeloma), two autoimmune diseases (ulcerative colitis and Crohns disease), one metabolic disorder (non-alcoholic steatohepatitis), and a rare disease (sickle cell anemia). Among these attributes, 367 eligibility criteria attributes were normalized. 174 were encoded with standard terminologies and 193 were normalized using the in-house reference tables. The agreement between automated and manually annotated normalized codes was found to be 0.82 and matching between eligibility criteria attribute and patient clinical information achieved a high F1-score of 0.94.
Conclusion
We established a clinical phenotyping pipeline facilitating effective communication between the eligibility criteria and EHR. The pipeline demonstrated its generalizability by being applied to EHR data from different institutes. Our pipeline shows the potential to significantly enhance the utilization of EHRs in clinical trial activities and improve patient matching and selection processes, thereby advancing clinical research and patient outcomes.