Abstract:The use of a standards-based, modular architecture for the development of phenotype algorithms enhances the interoperability of electronic health record (EHR) systems to allow the dissemination of algorithms across institutions. Here we describe the implementation of previously proposed modules of a comprehensive solution for the development, validation, execution and dissemination of EHR-driven phenotype algorithms. Introduction: The use of electronic health records (EHRs) for research has been a focus of the biomedical informatics research community, with many researchers and consortia describing methodologies for effective use of EHR data, as well as challenges discovered along the way[1, 2]. To aid in the development of phenotype algorithms using clinical data, several software solutions have been provided to the informatics community, including the Informatics for Integrating Biology and the Bedside (i2b2)[3] and Observational Health Data Sciences and Informatics (OHDSI)[4]. We previously described the Phenotype Execution and Modeling Architecture (PhEMA) – a modular software architecture that relies on components that interoperate using standard formats and interfaces[5]. These components are logically separated to complete a specific task, such as executing an algorithm and collecting the results. Having referenced existing systems in development of our proposed architecture, we noted limitations and gaps that we sought to address –specifically around increasing the use of standards and providing flexibility in configuring components to meet each institution’s needs. Methods: The PhEMA development team has identified available software systems for many of the proposed seven architecture components (Library for Artifacts, Authoring, Clinical Data Repository, Execution, Validation, Data Model Services and Terminology Services) for EHR phenotyping, and developed new software in the absence of existing solutions. During development, we designed the systems around concrete interfaces and specifications, but were agnostic to the choice of a particular programming language or development environment. Results: The PhEMA solution includes one or more implemented components, as shown in Figure 1. A demonstration system and source code are available from the project website (http://projectphema.org). Briefly, each of the implemented solutions is as follows: Terminology Services – Our use of the Quality Data Model (QDM) relies on value sets (collections of terms to represent concepts, derived from standard vocabularies). We not only provide users with read access to the NLM-hosted Value Set Authority Center (VSAC) for existing value sets, but also provide a separate read/write instance of a repository for custom value sets. Both repositories leverage the Common Terminology Services 2 (CTS2) standard[6] (the VSAC CTS2 service utilizes a CTS2 wrapper [VSMC]). The authoring tool may be configured to use one or both repositories during installation. Figure 1. Implemented components of the Phenotype Execution and Modeling Architecture (PhEMA). Blue boxes indicate newly developed software, while white are existing solutions.

Enabling phenotypic big data with PheNorm.

A Robust Phenotype-Driven Likelihood Ratio Analysis Approach Assisting Interpretable Clinical Diagnosis of Rare Diseases.

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)

Towardcross-Platformelectronic Health Record-Drivenphenotyping Using Clinical Quality Language

Surrogate-assisted Feature Extraction for High-Throughput Phenotyping.

Electronic Health Record Phenotyping with Internally Assessable Performance (PhIAP) using Anchor-Positive and Unlabeled Patients

Design and Validation of a FHIR-based EHR-driven Phenotyping Toolbox

PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care

Desiderata for Computable Representations of Electronic Health Records-Driven Phenotype Algorithms

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

PhenoID, a language model normalizer of physical examinations from genetics clinical notes

Development of self-phenotyping tools to empower patients and improve diagnostics

The Phenotype Execution and Modeling Architecture ( PhEMA ) – A Standards-Based Composition of Software for Phenotype Algorithm Development

High-throughput multimodal automated phenotyping (MAP) with application to PheWAS

Feature Extraction for Phenotyping from Semantic and Knowledge Resources

Large language models facilitate the generation of electronic health record phenotyping algorithms

PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.

Novel phenotype–disease matching tool for rare genetic diseases

Quantitative disease risk scores from EHR with applications to clinical risk stratification and genetic studies

Automated feature selection of predictors in electronic medical records data