Unsupervised learning from topological data analysis to identify cellular states from single-cell RNA-seq analysis

Aydolun Petenkaya,Chuansheng Hu,Constantinos Chronis,Zhifeng Shao,Jie Liang
DOI: https://doi.org/10.1016/j.bpj.2022.11.1989
IF: 3.4
2023-01-01
Biophysical Journal
Abstract:Tissues and organs consist of heterogeneous subpopulations of cells, each with a distinctive cellular state, even though they have the same genomic background. Important biological processes such as differentiation, reprogramming, or cancer development are accompanied with changes in the subpopulations of different cellular states. One approach to define the cellular states and quantify the cellular subpopulations is through measurement of single-cell RNA transcriptomes. However, current practices require a priori biological knowledge to annotate cellular states in a heterogeneous population. In this work, we investigate how cellular states or subpopulations in the transcriptome of the peripheral blood mononuclear cells (PBMCs) can be objectively defined without a priori biological information or human intervention. Our approach is based on topological data analysis and persistent homology, where we apply a recently developed method to define cellular states and identify cell subpopulations, eliminating the need for user input. To allow accurate identification of cellular states and subpopulations, we explore the effects of embedding the transcriptome into a space of different dimensionality, and the connectedness criterion in defining the embedded manifold of cellular states. The results of our analysis of the 3,000-cell PBMC dataset demonstrated that biologically relevant cellular subpopulations can be automatically identified without a priori input of any biological knowledge. These subpopulations include small groups of cells such as FCGR3A+ monocytes and natural killer cells. We discuss how our approach can be used to study other heterogeneous cellular systems.
What problem does this paper attempt to address?