Abstract:It is now clear that major malignancies are heterogeneous diseases associated with diverse molecular properties and clinical outcomes, posing a great challenge for more individualized therapy. In the last decade, cancer molecular subtyping studies were mostly based on transcriptomic profiles, ignoring heterogeneity at other (epi-)genetic levels of gene regulation. Integrating multiple types of (epi)genomic data generates a more comprehensive landscape of biological processes, providing an opportunity to better dissect cancer heterogeneity. Here, we propose sparse canonical correlation analysis for cancer classification (SCCA-CC), which projects each type of single-omics data onto a unified space for data fusion, followed by clustering and classification analysis. Without loss of generality, as case studies, we integrated two types of omics data, mRNA and miRNA profiles, for molecular classification of ovarian cancer ( n = 462), and breast cancer ( n = 451). The two types of omics data were projected onto a unified space using SCCA, followed by data fusion to identify cancer subtypes. The subtypes we identified recapitulated subtypes previously recognized by other groups (all P - values < 0.001), but display more significant clinical associations. Especially in ovarian cancer, the four subtypes we identified were significantly associated with overall survival, while the taxonomy previously established by TCGA did not ( P- values: 0.039 vs. 0.12). The multi-omics classifiers we established can not only classify individual types of data but also demonstrated higher accuracies on the fused data. Compared with iCluster, SCCA-CC demonstrated its superiority by identifying subtypes of higher coherence, clinical relevance, and time efficiency. In conclusion, we developed an integrated bioinformatic framework SCCA-CC for cancer molecular subtyping. Using two case studies in breast and ovarian cancer, we demonstrated its effectiveness in identifying biologically meaningful and clinically relevant subtypes. SCCA-CC presented a unique advantage in its ability to classify both single-omics data and multi-omics data, which significantly extends the applicability to various data types, and making more efficient use of published omics resources.

Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening

Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data

Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification

PCA-constrained multi-core matrix fusion network: A novel approach for cancer subtype identification

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data

PACS: Prediction and analysis of cancer subtypes from multi-omics data based on a multi-head attention mechanism model

Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data

Latent space search based multimodal optimization with personalized edge-network biomarker for multi-purpose early disease prediction

Supervised clustering of high-dimensional data using regularized mixture modeling

Unsupervised Analysis Based on DCE-MRI Radiomics Features Revealed Three Novel Breast Cancer Subtypes with Distinct Clinical Outcomes and Biological Characteristics

An improved cancer diagnosis algorithm for protein mass spectrometry based on PCA and a one-dimensional neural network combining ResNet and SENet

Deep multi-view contrastive learning for cancer subtype identification

Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based Sparse PCA Network

DSCENet: Dynamic Screening and Clinical-Enhanced Multimodal Fusion for MPNs Subtype Classification

A deep learning approach based on multi-omics data integration to construct a risk stratification prediction model for skin cutaneous melanoma

Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis

Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach

A Contrastive-Learning-Based Deep Neural Network for Cancer Subtyping by Integrating Multi-Omics Data

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA

Multi-layer matrix factorization for cancer subtyping using full and partial multi-omics dataset

Network-based Distance Metric with Application to Discover Disease Subtypes in Cancer