HGC: fast hierarchical clustering for large-scale single-cell data

Ziheng Zou,Kui Hua,Xuegong Zhang
DOI: https://doi.org/10.1093/bioinformatics/btab420
IF: 5.8
2021-01-01
Bioinformatics
Abstract:Clustering is a key step in revealing heterogeneities in single-cell data. Most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical clustering (HC) provides dendrograms of cells, but cannot scale to large datasets due to high computational complexity. We present HGC, a fast Hierarchical Graph-based Clustering tool to address both problems. It combines the advantages of graph-based clustering and HC. On the shared nearest-neighbor graph of cells, HGC constructs the hierarchical tree with linear time complexity. Experiments showed that HGC enables multiresolution exploration of the biological hierarchy underlying the data, achieves state-of-the-art accuracy on benchmark data and can scale to large datasets.
What problem does this paper attempt to address?