A clustering method for small scRNA-seq data based on subspace and weighted distance

Zilan Ning,Zhijun Dai,Hongyan Zhang,Yuan Chen,Zheming Yuan
DOI: https://doi.org/10.7717/peerj.14706
IF: 3.061
2023-01-23
PeerJ
Abstract:Background: Identifying the cell types using unsupervised methods is essential for scRNA-seq research. However, conventional similarity measures introduce challenges to single-cell data clustering because of the high dimensional, high noise, and high dropout. Methods: We proposed a clustering method for small ScRNA-seq data based on Subspace and Weighted Distance (SSWD), which follows the assumption that the sets of gene subspace composed of similar density-distributing genes can better distinguish cell groups. To accurately capture the intrinsic relationship among cells or genes, a new distance metric that combines Euclidean and Pearson distance through a weighting strategy was proposed. The relative Calinski-Harabasz (CH) index was used to estimate the cluster numbers instead of the CH index because it is comparable across degrees of freedom. Results: We compared SSWD with seven prevailing methods on eight publicly scRNA-seq datasets. The experimental results show that the SSWD has better clustering accuracy and the partitioning ability of cell groups. SSWD can be downloaded at https://github.com/ningzilan/SSWD.
What problem does this paper attempt to address?