Abstract:BACKGROUND:The Cancer Genome Atlas (TCGA) has collected transcriptome, genome and epigenome information for over 20 cancers from thousands of patients. The availability of these diverse data types makes it necessary to combine these data to capture the heterogeneity of biological processes and phenotypes and further identify homogeneous subtypes for cancers such as breast cancer. Many multi-view clustering approaches are proposed to discover clusters across different data types. The problem is challenging when different data types show poor agreement of clustering structure.RESULTS:In this work, we first propose a multi-view clustering approach with consensus (CMC), which tries to find consensus kernels among views by using Hilbert Schmidt Independence Criterion. To tackle the problem when poor agreement among views exists, we further propose a multi-view clustering approach with enhanced consensus (ECMC) to solve this problem by decomposing the kernel information in each view into a consensus part and a disagreement part. The consensus parts for different views are supposed to be similar, and the disagreement parts should be independent with the consensus parts. Both the CMC and ECMC models can be solved by alternative updating with semi-definite programming. Our experiments on both simulation datasets and real-world benchmark datasets show that ECMC model could achieve higher clustering accuracies than other state-of-art multi-view clustering approaches. We also apply the ECMC model to integrate mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets, and the survival analysis show that our ECMC model outperforms other methods when identifying cancer subtypes. By Fisher's combination test method, we found that three computed subtypes roughly correspond to three known breast cancer subtypes including luminal B, HER2 and basal-like subtypes.CONCLUSION:Integrating heterogeneous TCGA datasets by our proposed multi-view clustering approach ECMC could effectively identify cancer subtypes.

Heterogeneity Analysis Via Integrating Multi-Sources High-Dimensional Data with Applications to Cancer Studies

Robust Analysis of Cancer Heterogeneity for High‐dimensional Data

Network-based cancer heterogeneity analysis incorporating multi-view of prior information

Incorporating prior information in gene expression network-based cancer heterogeneity analysis

Robust structured heterogeneity analysis approach for high-dimensional data

Biomarker-guided heterogeneity analysis of genetic regulations via multivariate sparse fusion

Tumor Heterogeneity in Gastrointestinal Cancer Based on Multimodal Data Analysis

Regression‐based heterogeneity analysis to identify overlapping subgroup structure in high‐dimensional data

Unfolding the mysteries of heterogeneity from a high-resolution perspective: integration analysis of single-cell multi-omics and spatial omics revealed functionally heterogeneous cancer cells in ccRCC

Spatial transcriptomics inferred from pathology whole-slide images links tumor heterogeneity to survival in breast and lung cancer

Heterogeneity in Primary Tumors and Corresponding Metastases: Could It Provide Us with Any Hints to Personalize Cancer Therapy?

Systems Heterogeneity: an Integrative Way to Understand Cancer Heterogeneity.

Robust nonparametric integrative analysis to decipher heterogeneity and commonality across subgroups using sparse boosting

Heterogeneity Between Primary Colon Carcinoma and Paired Lymphatic and Hepatic Metastases.

HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene.

Integrative Analysis Of High-Throughput Cancer Studies With Contrasted Penalization

Cell Heterogeneity Analysis in Single-Cell RNA-seq Data Using Mixture Exponential Graph and Markov Random Field Model

Revealing genomic heterogeneity and commonality: A penalized integrative analysis approach accounting for the adjacency structure of measurements

Integrative analysis and variable selection with multiple high-dimensional data sets.

Subtype Identification from Heterogeneous TCGA Datasets on a Genomic Scale by Multi-View Clustering with Enhanced Consensus.

A clustering approach to integrative analyses of multiomic cancer data