Abstract:It is now clear that major malignancies are heterogeneous diseases associated with diverse molecular properties and clinical outcomes, posing a great challenge for more individualized therapy. In the last decade, cancer molecular subtyping studies were mostly based on transcriptomic profiles, ignoring heterogeneity at other (epi-)genetic levels of gene regulation. Integrating multiple types of (epi)genomic data generates a more comprehensive landscape of biological processes, providing an opportunity to better dissect cancer heterogeneity. Here, we propose sparse canonical correlation analysis for cancer classification (SCCA-CC), which projects each type of single-omics data onto a unified space for data fusion, followed by clustering and classification analysis. Without loss of generality, as case studies, we integrated two types of omics data, mRNA and miRNA profiles, for molecular classification of ovarian cancer ( n = 462), and breast cancer ( n = 451). The two types of omics data were projected onto a unified space using SCCA, followed by data fusion to identify cancer subtypes. The subtypes we identified recapitulated subtypes previously recognized by other groups (all P - values < 0.001), but display more significant clinical associations. Especially in ovarian cancer, the four subtypes we identified were significantly associated with overall survival, while the taxonomy previously established by TCGA did not ( P- values: 0.039 vs. 0.12). The multi-omics classifiers we established can not only classify individual types of data but also demonstrated higher accuracies on the fused data. Compared with iCluster, SCCA-CC demonstrated its superiority by identifying subtypes of higher coherence, clinical relevance, and time efficiency. In conclusion, we developed an integrated bioinformatic framework SCCA-CC for cancer molecular subtyping. Using two case studies in breast and ovarian cancer, we demonstrated its effectiveness in identifying biologically meaningful and clinically relevant subtypes. SCCA-CC presented a unique advantage in its ability to classify both single-omics data and multi-omics data, which significantly extends the applicability to various data types, and making more efficient use of published omics resources.

scMCs: a framework for single-cell multi-omics data integration and multiple clusterings

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization

MultiSC: a deep learning pipeline for analyzing multiomics single-cell data

scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data

scICML: Information-Theoretic Co-Clustering-Based Multi-View Learning for the Integrative Analysis of Single-Cell Multi-Omics Data

Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis

Clustering of single-cell multi-omics data with a multimodal deep learning method

Effective multi-modal clustering method via skip aggregation network for parallel scRNA-seq and scATAC-seq data

Orthogonal multimodality integration and clustering in single-cell data

scAMACE: Model-based approach to the joint analysis of single-cell data on chromatin accessibility, gene expression and methylation

Clustering CITE-seq data with a canonical correlation-based deep learning method

Robust joint clustering of multi-omics single-cell data via multi-modal high-order neighborhood Laplacian Matrix optimization

moSCminer: a cell subtype classification framework based on the attention neural network integrating the single-cell multi-omics dataset on the cloud

Clustering single-cell multi-omics data with MoClust

scMODAL: A general deep learning framework for comprehensive single-cell multi-omics data alignment with feature links

SCSMD: Single Cell Consistent Clustering based on Spectral Matrix Decomposition

Learning Consistency and Specificity of Cells from Single-cell Multi-omic Data

Clustering single-cell multi-omics data via graph regularized multi-view ensemble learning

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

DEMOC: a deep embedded multi-omics learning approach for clustering single-cell CITE-seq data

A New Graph Autoencoder-Based Multi-level Kernel Subspace Fusion Framework for Single-cell Type Identification