A VAC4EU Systematic Review to Summarize and Critically Appraise Existing Phenotype Libraries Using Electronic Health Records

Sima Mohammadi,Cori Campbell,Miriam CJM Sturkenboom,Tiago A. Vaz
DOI: https://doi.org/10.1101/2024.12.16.24319076
2024-12-16
Abstract:Background Pharmacoepidemiology and population health studies using secondary analysis of electronic health care records (EHR) must define study variables through available electronic data. Defining a study variable starts with the identification of a phenotype, which is a defined set of criteria used to identify specific traits or medical conditions. In the real-world data perspective, a phenotype library is a collection of code lists or algorithms that standardize these sets of criteria. We conducted a systematic review of existing phenotype libraries to appraise their attributes, accessibility, interoperability, and portability. Methods We systematically searched three databases (Scopus, PubMed, and Web of Science) until June 2024, to identify studies on key characteristics of phenotype libraries. The search combined MeSH terms related to "electronic health records," "phenotype algorithm," and "phenotype library". Study parameters extracted included: library size, vocabularies, phenotype construction tools, validation and library management process, and portability in different sites. Findings Of 134 articles, 26 met eligibility criteria, leaving nine articles related to eight unique phenotype libraries including CALIBER (Health Data Research UK (HDR UK) Phenotype Library or CALIBER), Centralized Interactive Phenomics Resource (CIPHER), ClinicalCodes Library, Manitoba Centre for Health Policy (MCHP) Concept Dictionary, Observational Health Data Sciences and Informatics (OHDSI) ATLAS, Open CodeLists, Phenotype Execution and Modeling Architecture (PhEMA) Workbench, Phenotype KnowledgeBase (PheKB). These libraries varied largely in size and vocabularies. Each library created rule-based phenotypes, though OHDSI and CIPHER also utilized machine learning. All libraries are both human and machine-readable. Validation processes varied and were only applied to some libraries. All libraries utilized a web-based platform and met at least the minimum requirements for library management, including phenotype definitions, metadata (if applicable), and version control. Interpretations We observed large variations in library features including phenotype construction. Transparency about phenotypes and creating computable phenotypes enhance portability and streamline the effective reuse of phenotypes for different systems. Funding This investigation was supported by a Fellowship awarded by VAC4EU (Vaccine Collaboration for Europe) Phenotype Representation Model: An International and Streamlined Approach to Enhance RWE Studies (grant nr 2023/0001):
What problem does this paper attempt to address?