Cumulus: a federated electronic health record-based learning system powered by Fast Healthcare Interoperability Resources and artificial intelligence

Andrew J McMurry,Daniel I Gottlieb,Timothy A Miller,James R Jones,Ashish Atreja,Jennifer Crago,Pankaja M Desai,Brian E Dixon,Matthew Garber,Vladimir Ignatov,Lyndsey A Kirchner,Philip R O Payne,Anil J Saldanha,Prabhu R V Shankar,Yauheni V Solad,Elizabeth A Sprouse,Michael Terry,Adam B Wilcox,Kenneth D Mandl
DOI: https://doi.org/10.1093/jamia/ocae130
2024-08-01
Abstract:Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app "listener" that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and artificial intelligence (AI) for processing unstructured text. Results: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across 5 healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. Discussion and conclusion: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs, (2) increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.
What problem does this paper attempt to address?