Abstract:Recent advances in NGS sequencing, microarrays and mass spectrometry for omics data production have enabled the generation and collection of different modalities of high-dimensional molecular data. The integration of multiple omics datasets is a statistical challenge, due to the limited number of individuals, the high number of variables and the heterogeneity of the datasets to integrate. Recently, a lot of tools have been developed to solve the problem of integrating omics data including canonical correlation analysis, matrix factorization and SM. These commonly used techniques aim to analyze simultaneously two or more types of omics. In this article, we compare a panel of 13 unsupervised methods based on these different approaches to integrate various types of multi-omics datasets: iClusterPlus, regularized generalized canonical correlation analysis, sparse generalized canonical correlation analysis, multiple co-inertia analysis (MCIA), integrative-NMF (intNMF), SNF, MoCluster, mixKernel, CIMLR, LRAcluster, ConsensusClustering, PINSPlus and multi-omics factor analysis (MOFA). We evaluate the ability of the methods to recover the subgroups and the variables that drive the clustering on eight benchmarks of simulation. MOFA does not provide any results on these benchmarks. For clustering, SNF, MoCluster, CIMLR, LRAcluster, ConsensusClustering and intNMF provide the best results. For variable selection, MoCluster outperforms the others. However, the performance of the methods seems to depend on the heterogeneity of the datasets (especially for MCIA, intNMF and iClusterPlus). Finally, we apply the methods on three real studies with heterogeneous data and various phenotypes. We conclude that MoCluster is the best method to analyze these omics data. Availability: An R package named CrIMMix is available on GitHub at https://github.com/CNRGH/crimmix to reproduce all the results of this article.

Clustering single-cell multi-omics data with MoClust

moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets

scMCs: a framework for single-cell multi-omics data integration and multiple clusterings

Orthogonal multimodality integration and clustering in single-cell data

Clustering of single-cell multi-omics data with a multimodal deep learning method

Clustering CITE-seq data with a canonical correlation-based deep learning method

Clustering single-cell multi-omics data via graph regularized multi-view ensemble learning

Robust joint clustering of multi-omics single-cell data via multi-modal high-order neighborhood Laplacian Matrix optimization

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization

Clustering single cell CITE-seq data with a canonical correlation based deep learning method

Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration

A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization

CLUEY enables knowledge-guided clustering and cell type detection from single-cell omics data

DEMOC: a deep embedded multi-omics learning approach for clustering single-cell CITE-seq data

ClusterMatch aligns single-cell RNA-sequencing data at the multi-scale cluster level via stable matching

scICML: Information-Theoretic Co-Clustering-Based Multi-View Learning for the Integrative Analysis of Single-Cell Multi-Omics Data

Spectral clustering of single cells using Siamese nerual network combined with improved affinity matrix

Model-based multifacet clustering with high-dimensional omics applications

Strategic Multi-Omics Data Integration via Multi-Level Feature Contrasting and Matching

scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data