Abstract 6242: A workflow execution system in a data fabric for integrative cancer analyses
Aarti Venkat,Pauline Ribeyre,Jawad Qureshi,Sai Shanmukha Narumanchi,J Montgomery Maxwell,Bill Winslow,Sara Volk Garcia,Chris Meyer,Tzintzuni Garcia,Peter Vassilatos,Clinton Malson,Zhenyu Zhang,Robert Grossman
DOI: https://doi.org/10.1158/1538-7445.am2024-6242
IF: 11.2
2024-03-23
Cancer Research
Abstract:Cancer researchers are increasingly conducting multi-omic research and performing integrative analyses on combinations of genomic, proteomic, transcriptomic, imaging, single cell and other data modalities. However, it is quite challenging for a researcher to effectively access, aggregate and analyze data and metadata from different data sources in a scalable and reproducible manner, as the individual datasets may be disconnected from each other, and have separate authentication and authorization requirements. We developed a workflow execution system in a data fabric called the Biomedical Research Hub (BRH) to overcome these challenges for academic researchers. The BRH is powered by the Gen3 technology, an open-source Kubernetes based software stack that allows cancer researchers to create their own data fabric and interoperate with data from multiple data sources. The workflow execution system utilizes nextflow and allows researchers to run containerized applications in the cloud in a secure and isolated environment. Data from multiple resources can be combined for analysis using convenient pay models including NIH STRIDES. We plan to demonstrate the application of our system on a scientific use case involving Clonal Hematopoiesis of Indeterminate potential (CHIP), a phenomenon that has been associated with aging, cancer, cardiovascular diseases, infection and all-cause mortality. We run containerized CHIP workflows on the cloud, utilizing two datasets accessible through BRH: i) The Genomic Data Commons, the world's largest source of harmonized cancer data and ii) BioDataCatalyst, an NHLBI ecosystem that drives discovery and innovation for heart, lung, blood and sleep disorders. Our system is ideally suited for machine learning on large aggregated cancer datasets and federated learning tasks. Citation Format: Aarti Venkat, Pauline Ribeyre, Jawad Qureshi, Sai Shanmukha Narumanchi, J Montgomery Maxwell, Bill Winslow, Sara Volk Garcia, Chris Meyer, Tzintzuni Garcia, Peter Vassilatos, Clinton Malson, Zhenyu Zhang, Robert Grossman. A workflow execution system in a data fabric for integrative cancer analyses [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 6242.
oncology