SAFE-clustering: Single-cell Aggregated (from Ensemble) Clustering for Single-Cell RNA-seq Data
Yuchen Yang,Ruth Huh,Houston W. Culpepper,Yuan Lin,Michael I. Love,Yun Li
DOI: https://doi.org/10.1093/bioinformatics/bty793
2019-01-01
Abstract:ABSTRACTMotivationAccurately clustering cell types from a mass of heterogeneous cells is a crucial first step for the analysis of single-cell RNA-seq (scRNA-Seq) data. Although several methods have been recently developed, they utilize different characteristics of data and yield varying results in terms of both the number of clusters and actual cluster assignments.ResultsHere, we present SAFE-clustering, Single-cell Aggregated (From Ensemble) clustering, a flexible, accurate and robust method for clustering scRNA-Seq data. SAFE-clustering takes as input, results from multiple clustering methods, to build one consensus solution. SAFE-clustering currently embeds four state-of-the-art methods, SC3, CIDR, Seurat and t-SNE + k-means; and ensembles solutions from these four methods using three hypergraph-based partitioning algorithms. Extensive assessment across 12 datasets with the number of clusters ranging from 3 to 14, and the number of single cells ranging from 49 to 32,695 showcases the advantages of SAFE-clustering in terms of both cluster number (18.9 - 50.0% reduction in absolute deviation to the truth) and cluster assignment (on average 28.9% improvement, and up to 34.5% over the best of the four methods, measured by adjusted rand index). Moreover, SAFE-clustering is computationally efficient to accommodate large datasets, taking <10 minutes to process 28,733 cells.Availability and implementationSAFE-clustering, including source codes and tutorial, is free available on the web at http://yunliweb.its.unc.edu/safe/.Contactyunli@med.unc.eduSupplementary informationSupplementary data are available at Bioinformatics online.