Abstract:It is now clear that major malignancies are heterogeneous diseases associated with diverse molecular properties and clinical outcomes, posing a great challenge for more individualized therapy. In the last decade, cancer molecular subtyping studies were mostly based on transcriptomic profiles, ignoring heterogeneity at other (epi-)genetic levels of gene regulation. Integrating multiple types of (epi)genomic data generates a more comprehensive landscape of biological processes, providing an opportunity to better dissect cancer heterogeneity. Here, we propose sparse canonical correlation analysis for cancer classification (SCCA-CC), which projects each type of single-omics data onto a unified space for data fusion, followed by clustering and classification analysis. Without loss of generality, as case studies, we integrated two types of omics data, mRNA and miRNA profiles, for molecular classification of ovarian cancer ( n = 462), and breast cancer ( n = 451). The two types of omics data were projected onto a unified space using SCCA, followed by data fusion to identify cancer subtypes. The subtypes we identified recapitulated subtypes previously recognized by other groups (all P - values < 0.001), but display more significant clinical associations. Especially in ovarian cancer, the four subtypes we identified were significantly associated with overall survival, while the taxonomy previously established by TCGA did not ( P- values: 0.039 vs. 0.12). The multi-omics classifiers we established can not only classify individual types of data but also demonstrated higher accuracies on the fused data. Compared with iCluster, SCCA-CC demonstrated its superiority by identifying subtypes of higher coherence, clinical relevance, and time efficiency. In conclusion, we developed an integrated bioinformatic framework SCCA-CC for cancer molecular subtyping. Using two case studies in breast and ovarian cancer, we demonstrated its effectiveness in identifying biologically meaningful and clinically relevant subtypes. SCCA-CC presented a unique advantage in its ability to classify both single-omics data and multi-omics data, which significantly extends the applicability to various data types, and making more efficient use of published omics resources.

Pattern Fusion Analysis By Adaptive Alignment Of Multiple Heterogeneous Omics Data

Integration of multiple heterogeneous omics data

IPFMC: an iterative pathway fusion approach for enhanced multi-omics clustering in cancer research

Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis

High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis.

Integrate Any Omics: Towards genome-wide data integration for patient stratification

Strategic Multi-Omics Data Integration via Multi-Level Feature Contrasting and Matching

A Similarity Regression Fusion Model for Integrating Multi-Omics Data to Identify Cancer Subtypes

Simultaneous Interrogation of Cancer Omics to Identify Subtypes with Significant Clinical Differences

Integrating Heterogeneous Genomic Data to Accurately Identify Disease Subtypes

Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data

Affinity network fusion and semi-supervised learning for cancer patient clustering

Comprehensive Evaluation of Fusion Transcript Detection Algorithms and a Meta-Caller to Combine Top Performing Methods in Paired-End RNA-seq Data.

AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration

A Benchmark Study of Deep Learning-Based Multi-Omics Data Fusion Methods for Cancer.

Identification of cancer omics commonality and difference via community fusion

Similarity Fusion Via Exploiting High Order Proximity for Cancer Subtyping.

scMCs: a framework for single-cell multi-omics data integration and multiple clusterings

Discovering Cancer Subtypes Via an Accurate Fusion Strategy on Multiple Profile Data

FastMix: A Versatile Data Integration Pipeline for Cell Type-Specific Biomarker Inference

PCA-constrained multi-core matrix fusion network: A novel approach for cancer subtype identification