Abstract:Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research.This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

Acceleration and automation of genomic data analysis to meet corporate compliance standards using advanced cloud components.

A cost-effective approach to improving performance of big genomic data analyses in clouds

SCAN: A Smart Application Platform for Empowering Parallelizations of Big Genomic Data Analysis in Clouds

NGScloud2: optimized bioinformatic analysis using Amazon Web Services

CloudATAC: a cloud-based framework for ATAC-Seq data analysis

FASTGenomics: An analytical ecosystem for single-cell RNA sequencing data

Transcriptomics and epigenetic data integration learning module on Google Cloud

Rapid NGS Analysis on Google Cloud Platform: performance benchmark and user tutorial

Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery

DNAscan: a fast, computationally and memory efficient bioinformatics pipeline for the analysis of DNA next-generation-sequencing data

miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines

GeneCloudOmics: A Data Analytic Cloud Platform for High-Throughput Gene Expression Analysis

A graphical, interactive and GPU-enabled workflow to process long-read sequencing data

Massive Genomic Data Processing and Deep Analysis

Reusable tutorials for using cloud-based computing environments for the analysis of bacterial gene expression data from bulk RNA sequencing

A Fully Integrated End-to-End Genome Analysis Accelerator for Next-Generation Sequencing

SEAseq: a portable and cloud-based chromatin occupancy analysis suite

QuickRNASeq Lifts Large-Scale RNA-seq Data Analyses to the Next Level of Automation and Interactive Visualization

Interoperable RNA-Seq analysis in the cloud

Harmonizing and integrating the NCI Genomic Data Commons through accessible, interactive, and cloud-enabled workflows

Custom Biomedical FAIR Data Analysis in the Cloud Using CAVATICA