Abstract:MOTIVATION:The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. The identification of cell types plays an essential role in the analysis of scRNA-seq data, which, in turn, influences the discovery of regulatory genes that induce heterogeneity. As the scale of sequencing data increases, the classical method of combining clustering and differential expression analysis to annotate cells becomes more costly in terms of both labor and resources. Existing scRNA-seq supervised classification method can alleviate this issue through learning a classifier trained on the labeled reference data and then making a prediction based on the unlabeled target data. However, such label transference strategy carries with risks, such as susceptibility to batch effect and further compromise of inherent discrimination of target data.RESULTS:In this article, inspired by unsupervised domain adaptation, we propose a flexible single cell semi-supervised clustering and annotation framework, scSemiCluster, which integrates the reference data and target data for training. We utilize structure similarity regularization on the reference domain to restrict the clustering solutions of the target domain. We also incorporates pairwise constraints in the feature learning process such that cells belonging to the same cluster are close to each other, and cells belonging to different clusters are far from each other in the latent space. Notably, without explicit domain alignment and batch effect correction, scSemiCluster outperforms other state-of-the-art, single-cell supervised classification and semi-supervised clustering annotation algorithms in both simulation and real data. To the best of our knowledge, we are the first to use both deep discriminative clustering and deep generative clustering techniques in the single-cell field.AVAILABILITYAND IMPLEMENTATION:An implementation of scSemiCluster is available from https://github.com/xuebaliang/scSemiCluster.SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding

Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network

Deep soft K-means clustering with self-training for single-cell RNA sequence data

scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding

Clustering single-cell RNA-seq data with a model-based deep learning approach

Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis

scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive Learning

Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations

scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Deep Learning for clustering single-cell RNA-seq Data

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

Robust scRNA-seq Cell Types Identification by Self-Guided Deep Clustering Network

Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

Deep learning enables accurate clustering and batch effect removal in single-cell RNA-seq analysis

A Fusion Learning Model Based on Deep Learning for Single-Cell RNA Sequencing Data Clustering

Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation

jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data

Single-cell RNA-seq Data Semi-Supervised Clustering and Annotation Via Structural Regularized Domain Adaptation

Single Cell Self-Paced Clustering with Transcriptome Sequencing Data

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

Single-cell RNA-seq clustering: datasets, models, and algorithms