Abstract:Motivation: The advent of highly multiplexed in situ imaging cytometry assays has revolutionized the study of cellular systems, offering unparalleled detail in observing cellular activities and characteristics. These assays provide comprehensive insights by concurrently profiling the spatial distribution and molecular features of numerous cells. In navigating this complex data landscape, unsupervised machine learning techniques, particularly clustering algorithms, have become essential tools. They enable the identification and categorization of cell types and subsets based on their molecular characteristics. Despite their widespread adoption, most clustering algorithms in use were initially developed for cell suspension technologies, leading to a potential mismatch in application. There is a critical gap in the systematic evaluation of these methods, particularly in determining the properties that make them optimal for in situ imaging assays. Addressing this gap is vital for ensuring accurate, reliable analyses and fostering advancements in cellular biology research. Results: In our extensive investigation, we evaluated a range of similarity metrics, which are crucial in determining the relationships between cells during the clustering process. Our findings reveal substantial variations in clustering performance, contingent on the similarity metric employed. These variations underscore the importance of selecting appropriate metrics to ensure accurate cell type and subset identification. In response to these challenges, we introduce FuseSOM, a novel ensemble clustering algorithm that integrates hierarchical multiview learning of similarity metrics with self-organizing maps. Through a rigorous stratified subsampling analysis framework, we demonstrate that FuseSOM outperforms existing best-practice clustering methods specifically tailored for in situ imaging cytometry data. Our work not only provides critical insights into the performance of clustering algorithms in this novel context but also offers a robust solution, paving the way for more accurate and reliable in situ imaging cytometry data analysis. Availability and implementation: The FuseSOM R package is available on Bioconductor and is available under the GPL-3 license. All the codes for the analysis performed can be found at Github.

Applying Clustering Analysis to Heterogeneous Data Using Similarity Matrix Fusion (SMF)

Clustering by Heterogeneous Data Fusion : Framework and Applications

SMCC: A Novel Clustering Method for Single- and Multi-Omics Data Based on Co-Regularized Network Fusion.

Similarity Fusion Via Exploiting High Order Proximity for Cancer Subtyping.

SiMilarity-Enhanced Homophily for Multi-View Heterophilous Graph Clustering

Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data

Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis

Deep Subspace Similarity Fusion for the Prediction of Cancer Subtypes

Integrating Multi-Omic Data with Deep Subspace Fusion Clustering for Cancer Subtype Prediction

Fusion Matrix–Based Text Similarity Measures for Clustering of Retrieval Results

Heterogeneous Matrix Factorization: When Features Differ by Datasets

Simulation-derived best practices for clustering clinical data

Enhancing Medline Document Clustering by Incorporating Mesh Semantic Similarity

Affinity network fusion and semi-supervised learning for cancer patient clustering

Integrative Subspace Clustering by Common and Specific Decomposition for Applications on Cancer Subtype Identification

scMCs: a framework for single-cell multi-omics data integration and multiple clusterings

The impact of similarity metrics on cell-type clustering in highly multiplexed in situ imaging cytometry data

Multi-scale Geometric Summaries for Similarity-based Sensor Fusion

SA-PSO-GK++: A New Hybrid Clustering Approach for Analyzing Medical Data

Fusing heterogeneous data sets

HBIC: A Biclustering Algorithm for Heterogeneous Datasets