Abstract:Genetic biomarkers have played a pivotal role in the classification, prognostication, and guidance of clinical cancer therapies. Large-scale and multi-dimensional analyses of entire cancer genomes, as exemplified by projects like The Cancer Genome Atlas (TCGA), have yielded an extensive repository of data that holds the potential to unveil the underlying biology of these malignancies. Mutations stand out as the principal catalysts of cellular transformation. Nonetheless, other global genomic processes, such as alterations in gene expression and chromosomal re-arrangements, also play crucial roles in conferring cellular immortality. The incorporation of multi-omics data specific to cancer has demonstrated the capacity to enhance our comprehension of the molecular mechanisms underpinning carcinogenesis. This report elucidates how the integration of comprehensive data on methylation, gene expression, and copy number variations can effectively facilitate the unsupervised clustering of cancer samples. We have identified regressors that can effectively classify tumor and normal samples with an optimal integration of RNA sequencing, DNA methylation, and copy number variation while also achieving significant p-values. Further, these regressors were trained using linear and logistic regression with k-means clustering. For comparison, we employed autoencoder- and stacking-based omics integration and computed silhouette scores to evaluate the clusters. The proof of concept is illustrated using liver cancer data. Our analysis serves to underscore the feasibility of unsupervised cancer classification by considering genetic markers beyond mutations, thereby emphasizing the clinical relevance of additional global cellular parameters that contribute to the transformative process in cells. This work is clinically relevant because changes in gene expression and genomic re-arrangements have been shown to be signatures of cellular transformation across cancers, as well as in liver cancers.

A Probabilistic Multi-Omics Data Matching Method for Detecting Sample Errors in Integrative Analysis.

MODMatcher: multi-omics data matcher for integrative genomic analysis.

A Community Effort to Identify and Correct Mislabeled Samples in Proteogenomic Studies

Mimatch: a Microbial Metabolic Background Matching Tool for Mitigating Host Confounding in Metagenomics Research

Identification and correction of sample mix-ups in expression genetic data: A case study

Machine learning for multi-omics data integration in cancer

Integrate Any Omics: Towards genome-wide data integration for patient stratification

Methods for multi-omic data integration in cancer research

A Clustering Approach to Integrative Analysis of Multiomic Cancer Data

High‐dimensional Integrative Copula Discriminant Analysis for Multiomics Data

Multi-Omics Integration for Liver Cancer Using Regression Analysis

Multi-omics integration with weighted affinity and self-diffusion applied for cancer subtypes identification

Alignment of single-cell RNA-seq samples without overcorrection using kernel density matching

Comparative analysis of integrative classification methods for multi-omics data

Evaluation and comparison of multi-omics data integration methods for cancer subtyping

A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research

Demystifying Statistical Matching Algorithms for Big Data

Major Copy Proportion Analysis of Tumor Samples Using Snp Arrays

IMIX: a multivariate mixture model approach to association analysis through multi-omics data integration

moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets

Knowledge-guided Learning Methods for Integrative Analysis of Multi-omics Data