High performance on-demand de-identification of a petabyte-scale medical imaging data lake

Joseph Mesterhazy,Garrick Olson,Somalee Datta
DOI: https://doi.org/10.48550/arXiv.2008.01827
2020-08-05
Abstract:With the increase in Artificial Intelligence driven approaches, researchers are requesting unprecedented volumes of medical imaging data which far exceed the capacity of traditional on-premise client-server approaches for making the data research analysis-ready. We are making available a flexible solution for on-demand de-identification that combines the use of mature software technologies with modern cloud-based distributed computing techniques to enable faster turnaround in medical imaging research. The solution is part of a broader platform that supports a secure high performance clinical data science platform.
Distributed, Parallel, and Cluster Computing,Performance
What problem does this paper attempt to address?