Identifying Cell Subpopulations and Their Genetic Drivers from Single-Cell RNA-Seq Data Using a Biclustering Approach

Funan Shi,Haiyan Huang
DOI: https://doi.org/10.1089/cmb.2017.0049
IF: 1.549
2017-07-01
Journal of Computational Biology
Abstract:Single-cell RNA-Seq (scRNA-Seq) has attracted much attention recently because it allows unprecedented resolution into cellular activity; the technology, therefore, has been widely applied in studying cell heterogeneity such as the heterogeneity among embryonic cells at varied developmental stages or cells of different cancer types or subtypes. A pertinent question in such analyses is to identify cell subpopulations as well as their associated genetic drivers. Consequently, a multitude of approaches have been developed for clustering or biclustering analysis of scRNA-Seq data. In this article, we present a fast and simple iterative biclustering approach called "BiSNN-Walk" based on the existing SNN-Cliq algorithm. One of BiSNN-Walk's differentiating features is that it returns a ranked list of clusters, which may serve as an indicator of a cluster's reliability. Another important feature is that BiSNN-Walk ranks genes in a gene cluster according to their level of affiliation to the associated cell cluster, making the result more biologically interpretable. We also introduce an entropy-based measure for choosing a highly clusterable similarity matrix as our starting point among a wide selection to facilitate the efficient operation of our algorithm. We applied BiSNN-Walk to three large scRNA-Seq studies, where we demonstrated that BiSNN-Walk was able to retain and sometimes improve the cell clustering ability of SNN-Cliq. We were able to obtain biologically sensible gene clusters in terms of GO term enrichment. In addition, we saw that there was significant overlap in top characteristic genes for clusters corresponding to similar cell states, further demonstrating the fidelity of our gene clusters.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology,computer science, interdisciplinary applications,statistics & probability
What problem does this paper attempt to address?