On metrics for subpopulation detection in single-cell and spatial omics data

Siyuan Luo,Pierre-Luc Germain,Ferdinand von Meyenn,Mark D Robinson
DOI: https://doi.org/10.1101/2024.11.28.625845
2024-12-03
Abstract:Benchmarks are crucial to understanding the strengths and weaknesses of the growing number of tools for single-cell and spatial omics analysis. A key task is to distinguish subpopulations within complex tissues, where evaluation typically relies on external clustering validation metrics. Different metrics often lead to inconsistencies between rankings, highlighting the importance of understanding the behavior and biological implications of each metric. In this work, we provide a framework for systematically understanding and selecting validation metrics for single-cell data analysis, addressing tasks such as creating cell embeddings, constructing graphs, clustering, and spatial domain detection. Our discussion centers on the desirable properties of metrics, focusing on biological relevance and potential biases. Using this framework, we not only analyze existing metrics, but also develop novel ones. Delving into domain detection in spatial omics data, we develop new external metrics tailored to spatially-aware measurements. Additionally, an R package, poem, implements all the metrics discussed.
Bioinformatics
What problem does this paper attempt to address?