Abstract:Background The harmonization and standardization of digital medical information for research purposes is a challenging and ongoing collaborative effort. Current research data repositories typically require extensive efforts in harmonizing and transforming original clinical data. The Fast Healthcare Interoperability Resources (FHIR) format was designed primarily to represent clinical processes; therefore, it closely resembles the clinical data model and is more widely available across modern electronic health records. However, no common standardized data format is directly suitable for statistical analyses, and data need to be preprocessed before statistical analysis. Objective This study aimed to elucidate how FHIR data can be queried directly with a preprocessing service and be used for statistical analyses. Methods We propose that the binary JavaScript Object Notation format of the PostgreSQL (PSQL) open source database is suitable for not only storing FHIR data, but also extending it with preprocessing and filtering services, which directly transform data stored in FHIR format into prepared data subsets for statistical analysis. We specified an interface for this preprocessor, implemented and deployed it at University Hospital Erlangen-Nürnberg, generated 3 sample data sets, and analyzed the available data. Results We imported real-world patient data from 2016 to 2018 into a standard PSQL database, generating a dataset of approximately 35.5 million FHIR resources, including “Patient,” “Encounter,” “Condition” (diagnoses specified using International Classification of Diseases codes), “Procedure,” and “Observation” (laboratory test results). We then integrated the developed preprocessing service with the PSQL database and the locally installed web-based KETOS analysis platform. Advanced statistical analyses were feasible using the developed framework using 3 clinically relevant scenarios (data-driven establishment of hemoglobin reference intervals, assessment of anemia prevalence in patients with cancer, and investigation of the adverse effects of drugs). Conclusions This study shows how the standard open source database PSQL can be used to store FHIR data and be integrated with a specifically developed preprocessing and analysis framework. This enables dataset generation with advanced medical criteria and the integration of subsequent statistical analysis. The web-based preprocessing service can be deployed locally at the hospital level, protecting patients’ privacy while being integrated with existing open source data analysis tools currently being developed across Germany.

A scalable approach for critical care data extraction and analysis in an academic medical center

Harnessing Big Data in Critical Care: Exploring a new European Dataset

Exploratory Electronic Health Record Analysis with Ehrapy

Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform

Establishment of a Chinese Critical Care Database from Electronic Healthcare Records in a Tertiary Care Medical Center

A highly scalable repository of waveform and vital signs data from bedside monitoring devices

The eICU Collaborative Research Database, a freely available multi-center database for critical care research

Depression and anxiety in heart failure.

A Methodology for a Scalable, Collaborative, and Resource-Efficient Platform to Facilitate Healthcare AI Research

A Scalable Data Science Platform for Healthcare and Precision Medicine Research

Unlocking the Potential of Secondary Data for Public Health Research: Retrospective Study With a Novel Clinical Platform

Next-generation study databases require FAIR, EHR-integrated, and scalable Electronic Data Capture for medical documentation and decision support

An open-source framework for end-to-end analysis of electronic health record data

Data integration between clinical research and patient care: A framework for context-depending data sharing and in silico predictions

Intravascular fasciitis.

Enabling scalable clinical interpretation of ML-based phenotypes using real world data

A plan to defeat neglected tropical diseases.

A Framework for Criteria-Based Selection and Processing of Fast Healthcare Interoperability Resources (FHIR) Data for Statistical Analysis: Design and Implementation Study

Comparison of Peripheral Blood with Heart Blood in Guinea Pigs.

Correction: mTOR Inhibition Attenuates Dextran Sulfate Sodium-Induced Colitis by Suppressing T Cell Proliferation and Balancing TH1/TH17/Treg Profile

An Intelligent Search & Retrieval System (IRIS) and Clinical and Research Repository for Decision Support Based on Machine Learning and Joint Kernel-based Supervised Hashing