Single-cell RNA-seq Data Semi-Supervised Clustering and Annotation Via Structural Regularized Domain Adaptation

Liang Chen,Qiuyan He,Yuyao Zhai,Minghua Deng
DOI: https://doi.org/10.1093/bioinformatics/btaa908
IF: 5.8
2020-01-01
Bioinformatics
Abstract:MOTIVATION:The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. The identification of cell types plays an essential role in the analysis of scRNA-seq data, which, in turn, influences the discovery of regulatory genes that induce heterogeneity. As the scale of sequencing data increases, the classical method of combining clustering and differential expression analysis to annotate cells becomes more costly in terms of both labor and resources. Existing scRNA-seq supervised classification method can alleviate this issue through learning a classifier trained on the labeled reference data and then making a prediction based on the unlabeled target data. However, such label transference strategy carries with risks, such as susceptibility to batch effect and further compromise of inherent discrimination of target data.RESULTS:In this article, inspired by unsupervised domain adaptation, we propose a flexible single cell semi-supervised clustering and annotation framework, scSemiCluster, which integrates the reference data and target data for training. We utilize structure similarity regularization on the reference domain to restrict the clustering solutions of the target domain. We also incorporates pairwise constraints in the feature learning process such that cells belonging to the same cluster are close to each other, and cells belonging to different clusters are far from each other in the latent space. Notably, without explicit domain alignment and batch effect correction, scSemiCluster outperforms other state-of-the-art, single-cell supervised classification and semi-supervised clustering annotation algorithms in both simulation and real data. To the best of our knowledge, we are the first to use both deep discriminative clustering and deep generative clustering techniques in the single-cell field.AVAILABILITYAND IMPLEMENTATION:An implementation of scSemiCluster is available from https://github.com/xuebaliang/scSemiCluster.SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?