TCGAnalyzeR: An Online Pan-Cancer Tool for Integrative Visualization of Molecular and Clinical Data of Cancer Patients for Cohort and Associated Gene Discovery

Talip Zengin,Başak Abak Masud,Tuğba Önal-Süzek
DOI: https://doi.org/10.3390/cancers16020345
2024-01-14
Cancers
Abstract:For humans, the parallel processing capability of visual recognition allows for faster comprehension of complex scenes and patterns. This is essential, especially for clinicians interpreting big data for whom the visualization tools play an even more vital role in transforming raw big data into clinical decision making by managing the inherent complexity and monitoring patterns interactively in real time. The Cancer Genome Atlas (TCGA) database's size and data variety challenge the effective utilization of this valuable resource by clinicians and biologists. We re-analyzed the five molecular data types, i.e., mutation, transcriptome profile, copy number variation, miRNA, and methylation data, of ~11,000 cancer patients with all 33 cancer types and integrated the existing TCGA patient cohorts from the literature into a free and efficient web application: TCGAnalyzeR. TCGAnalyzeR provides an integrative visualization of pre-analyzed TCGA data with several novel modules: (i) simple nucleotide variations with driver prediction; (ii) recurrent copy number alterations; (iii) differential expression in tumor versus normal, with pathway and the survival analysis; (iv) TCGA clinical data including metastasis and survival analysis; (v) external subcohorts from the literature, curatedTCGAData, and BiocOncoTK R packages; (vi) internal patient clusters determined using an iClusterPlus R package or signature-based expression analysis of five molecular data types. TCGAnalyzeR integrated the multi-omics, pan-cancer TCGA with ~120 subcohorts from the literature along with clipboard panels, thus allowing users to create their own subcohorts, compare against existing external subcohorts (MSI, Immune, PAM50, Triple Negative, IDH1, miRNA, metastasis, etc.) along with our internal patient clusters, and visualize cohort-centric or gene-centric results interactively using TCGAnalyzeR.
oncology
What problem does this paper attempt to address?
The paper aims to address the following key issues: ### Main Issues Addressed by the Paper 1. **Integration and Visualization Issue**: Facing the vast cancer genomic datasets (such as the TCGA database), how to effectively integrate multiple types of molecular data (including mutations, transcriptome, copy number variations, miRNA expression, and methylation, etc.) with clinical data, and provide an intuitive and user-friendly way to support clinical decision-making. 2. **Sub-cohort Analysis Issue**: How to identify and compare patient sub-cohorts with specific characteristics (such as microsatellite instability, immune phenotype, metastasis, etc.) across various cancer types to better understand the features of these subgroups and their impact on treatment response. 3. **Evaluation of Personalized Medical Tests**: Given the existence of various personalized tumor diagnostic tests based on multi-gene variation detection in the market (such as Oncomine Dx Target Test, Oncotype DX, etc.), how to evaluate the clinical performance of these tests compared to traditional single-gene tests. ### Specific Objectives - Develop an online pan-cancer tool named TCGAnalyzeR to integrate and visualize molecular and clinical data from the TCGA database, facilitating the discovery of patient cohorts and related genes. - Integrate 123 pre-computed pan-cancer cohorts, including microsatellite instability, immune features, metastasis, PAM50, triple-negative breast cancer, IDH1 mutant glioblastoma, etc., and provide internally computed sub-cohorts. - Provide a user-friendly and customizable interface that allows users to select their own "My Patients" or "My Genes" and add them to the clipboard for further analysis. - Support visualization of multiple modules, including simple nucleotide variations (SNV), copy number variations (CNV), differential expression analysis, clinical data (including metastasis and survival analysis), etc. - By integrating external sub-cohorts from multiple literatures, enable users to create their own sub-cohorts and compare them with existing cohorts, thereby achieving interactive cohort-centric or gene-centric result visualization. In summary, this study aims to address the effective utilization of large cancer genomic data by developing the TCGAnalyzeR tool, providing a powerful data analysis platform for clinicians and researchers to support the development of precision medicine.