Blockchain-enabled immutable, distributed, and highly available clinical research activity logging system for federated COVID-19 data analysis from multiple institutions

Tsung-Ting Kuo,Anh Pham,Maxim E Edelson,Jihoon Kim,Jason Chan,Yash Gupta,Lucila Ohno-Machado,R2D2 Consortium,David M Anderson,Chandrasekar Balacha,Tyler Bath,Sally L Baxter,Andrea Becker-Pennrich,Douglas S Bell,Elmer V Bernstam,Chau Ngan,Michele E Day,Jason N Doctor,Scott DuVall,Robert El-Kareh,Renato Florian,Robert W Follett,Benjamin P Geisler,Alessandro Ghigi,Assaf Gottlieb,Ludwig C Hinske,Zhaoxian Hu,Diana Ir,Xiaoqian Jiang,Katherine K Kim,Tara K Knight,Jejo D Koola,Nelson Lee,Ulrich Mansmann,Michael E Matheny,Daniella Meeker,Zongyang Mou,Larissa Neumann,Nghia H Nguyen,Anderson Nick,Eunice Park,Paulina Paul,Mark J Pletcher,Kai W Post,Clemens Rieder,Clemens Scherer,Lisa M Schilling,Andrey Soares,Spencer SooHoo,Ekin Soysal,Covington Steven,Brian Tep,Brian Toy,Baocheng Wang,Zhen R Wu,Hua Xu,Choi Yong,Kai Zheng,Yujia Zhou,Rachel A Zucker
DOI: https://doi.org/10.1093/jamia/ocad049
2023-05-19
Abstract:Objective: We aimed to develop a distributed, immutable, and highly available cross-cloud blockchain system to facilitate federated data analysis activities among multiple institutions. Materials and methods: We preprocessed 9166 COVID-19 Structured Query Language (SQL) code, summary statistics, and user activity logs, from the GitHub repository of the Reliable Response Data Discovery for COVID-19 (R2D2) Consortium. The repository collected local summary statistics from participating institutions and aggregated the global result to a COVID-19-related clinical query, previously posted by clinicians on a website. We developed both on-chain and off-chain components to store/query these activity logs and their associated queries/results on a blockchain for immutability, transparency, and high availability of research communication. We measured run-time efficiency of contract deployment, network transactions, and confirmed the accuracy of recorded logs compared to a centralized baseline solution. Results: The smart contract deployment took 4.5 s on an average. The time to record an activity log on blockchain was slightly over 2 s, versus 5-9 s for baseline. For querying, each query took on an average less than 0.4 s on blockchain, versus around 2.1 s for baseline. Discussion: The low deployment, recording, and querying times confirm the feasibility of our cross-cloud, blockchain-based federated data analysis system. We have yet to evaluate the system on a larger network with multiple nodes per cloud, to consider how to accommodate a surge in activities, and to investigate methods to lower querying time as the blockchain grows. Conclusion: Blockchain technology can be used to support federated data analysis among multiple institutions.
What problem does this paper attempt to address?