HGC: fast hierarchical clustering for large-scale single-cell data

Ziheng Zou,Kui Hua,Xuegong Zhang
DOI: https://doi.org/10.1101/2021.02.07.430106
2021-01-01
Abstract:AbstractClustering is a key step in revealing heterogeneities in single-cell data. Cell heterogeneity can be explored at different resolutions and the resulted varying cell states are inherently nested. However, most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical clustering provides dendrogram of cells, but cannot scale to large datasets due to the high computational complexity. We present HGC, a fast Hierarchical Graph-based Clustering method to address both problems. It combines the advantages of graph-based clustering and hierarchical clustering. On the shared nearest neighbor graph of cells, HGC constructs the hierarchical tree with linear time complexity. Experiments showed that HGC enables multiresolution exploration of the biological hierarchy underlying the data, achieves state-of-the-art accuracy on benchmark data, and can scale to large datasets. HGC is freely available for academic use at https://www.github.com/XuegongLab/HGC.Contactzhangxg@tsinghua.edu.cn, stevenhuakui@gmail.com
What problem does this paper attempt to address?