Abstract 3547: The ISB Cancer Gateway in the Cloud (ISB-CGC): Access, explore and analyze large-scale cancer data through the Google Cloud

Fabian Seidl,Lauren Hagen,Jacob Wilson,Boris Aguilar,Deena Bleich,Lauren Wolfe,Poojitha Gundluru,Prema Venkatesan,Mi Tian,Suzanne Paquette,Elaine Lee,Danna Huffman,David Pot,William Longabaugh
DOI: https://doi.org/10.1158/1538-7445.am2024-3547
IF: 11.2
2024-03-22
Cancer Research
Abstract:Abstract Rapid growth of cancer data in recent decades has made data discovery and wrangling difficult for the average cancer research lab. Our mission at the ISB Cancer Gateway in the Cloud (ISB-CGC), part of the NCI’s Cancer Research Data Commons ecosystem, is to democratize access to large cancer datasets. Funded by the NCI, we have performed ETL processes on data from GDC and PDC projects such as TCGA, TARGET, and CPTAC. We generated hundreds of BigQuery tables containing data such as mutations, gene expression, and protein abundance, which enable data analysis in the cloud via SQL. BigQuery analyses are inexpensive and rapid even when scaled to petabyte sized inputs, for example we ran 6.6 billion correlations in 2.5 hours with a total cost of about one dollar. These data can also be accessed affordably from Google Cloud VMs where researchers can develop analysis pipelines in Python, R, and workflow languages such as CWL. We present two recent collaborations: In one BigQuery was used to develop machine learning algorithms that calculated genetic risk scores from TCGA glioblastoma and ovarian cancer copy number variation. In another example researchers combined SQL queries of our BQ tables with data from the ISPY2 Trial initiative and generated an R shiny app that can dynamically create data visualizations for genes of interest in different TCGA cohorts. Citation Format: Fabian Seidl, Lauren Hagen, Jacob Wilson, Boris Aguilar, Deena Bleich, Lauren Wolfe, Poojitha Gundluru, Prema Venkatesan, Mi Tian, Suzanne Paquette, Elaine Lee, Danna Huffman, David Pot, William Longabaugh. The ISB Cancer Gateway in the Cloud (ISB-CGC): Access, explore and analyze large-scale cancer data through the Google Cloud [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3547.
oncology
What problem does this paper attempt to address?