Abstract:It is now clear that major malignancies are heterogeneous diseases associated with diverse molecular properties and clinical outcomes, posing a great challenge for more individualized therapy. In the last decade, cancer molecular subtyping studies were mostly based on transcriptomic profiles, ignoring heterogeneity at other (epi-)genetic levels of gene regulation. Integrating multiple types of (epi)genomic data generates a more comprehensive landscape of biological processes, providing an opportunity to better dissect cancer heterogeneity. Here, we propose sparse canonical correlation analysis for cancer classification (SCCA-CC), which projects each type of single-omics data onto a unified space for data fusion, followed by clustering and classification analysis. Without loss of generality, as case studies, we integrated two types of omics data, mRNA and miRNA profiles, for molecular classification of ovarian cancer ( n = 462), and breast cancer ( n = 451). The two types of omics data were projected onto a unified space using SCCA, followed by data fusion to identify cancer subtypes. The subtypes we identified recapitulated subtypes previously recognized by other groups (all P - values < 0.001), but display more significant clinical associations. Especially in ovarian cancer, the four subtypes we identified were significantly associated with overall survival, while the taxonomy previously established by TCGA did not ( P- values: 0.039 vs. 0.12). The multi-omics classifiers we established can not only classify individual types of data but also demonstrated higher accuracies on the fused data. Compared with iCluster, SCCA-CC demonstrated its superiority by identifying subtypes of higher coherence, clinical relevance, and time efficiency. In conclusion, we developed an integrated bioinformatic framework SCCA-CC for cancer molecular subtyping. Using two case studies in breast and ovarian cancer, we demonstrated its effectiveness in identifying biologically meaningful and clinically relevant subtypes. SCCA-CC presented a unique advantage in its ability to classify both single-omics data and multi-omics data, which significantly extends the applicability to various data types, and making more efficient use of published omics resources.

moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets

Clustering single-cell multi-omics data with MoClust

Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration

Robust joint clustering of multi-omics single-cell data via multi-modal high-order neighborhood Laplacian Matrix optimization

A Clustering Approach to Integrative Analysis of Multiomic Cancer Data

A clustering approach to integrative analyses of multiomic cancer data

Model-based multifacet clustering with high-dimensional omics applications

Sparse integrative clustering of multiple omics data sets

Integrative Model-based clustering of microarray methylation and expression data

scMCs: a framework for single-cell multi-omics data integration and multiple clusterings

Omada: robust clustering of transcriptomes through multiple testing

A multivariate approach to the integration of multi-omics datasets

Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis

MONET: Multi-omic module discovery by omic selection

Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm

MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling

Using clusterProfiler to characterize multiomics data

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms

Identification of functional gene modules by integrating multi-omics data and known molecular interactions

DEMOC: a deep embedded multi-omics learning approach for clustering single-cell CITE-seq data