Abstract:Abstract Motivation The rapid development of single-cell RNA sequencing (scRNA-seq) makes it possible to study the heterogeneity of individual cell characteristics. Cell clustering is a vital procedure in scRNA-seq analysis, providing insight into complex biological phenomena. However, the noisy, high-dimensional and large-scale nature of scRNA-seq data introduces challenges in clustering analysis. Up to now, many deep learning-based methods have emerged to learn underlying feature representations while clustering. However, these methods are inefficient when it comes to rare cell type identification and barely able to fully utilize gene dependencies or cell similarity integrally. As a result, they cannot detect a clear cell type structure which is required for clustering accuracy as well as downstream analysis. Results Here, we propose a novel scRNA-seq clustering algorithm called scNAME which incorporates a mask estimation task for gene pertinence mining and a neighborhood contrastive learning framework for cell intrinsic structure exploitation. The learned pattern through mask estimation helps reveal uncorrupted data structure and denoise the original single-cell data. In addition, the randomly created augmented data introduced in contrastive learning not only helps improve robustness of clustering, but also increases sample size in each cluster for better data capacity. Beyond this, we also introduce a neighborhood contrastive paradigm with an offline memory bank, global in scope, which can inspire discriminative feature representation and achieve intra-cluster compactness, yet inter-cluster separation. The combination of mask estimation task, neighborhood contrastive learning and global memory bank designed in scNAME is conductive to rare cell type detection. The experimental results of both simulations and real data confirm that our method is accurate, robust and scalable. We also implement biological analysis, including marker gene identification, gene ontology and pathway enrichment analysis, to validate the biological significance of our method. To the best of our knowledge, we are among the first to introduce a gene relationship exploration strategy, as well as a global cellular similarity repository, in the single-cell field. Availability and implementation An implementation of scNAME is available from https://github.com/aster-ww/scNAME. Supplementary information Supplementary data are available at Bioinformatics online.

An interpretable single-cell RNA sequencing data clustering method based on latent Dirichlet allocation

Dirichlet process mixture models for single-cell RNA-seq clustering

Deep Learning for clustering single-cell RNA-seq Data

Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation

scDA: Single cell discriminant analysis for single-cell RNA sequencing data

Single-cell RNA-seq Data Semi-Supervised Clustering and Annotation Via Structural Regularized Domain Adaptation

Analysis of Single-Cell RNA-seq Data by Clustering Approaches

Optimization and Redevelopment of Single-Cell Data Analysis Workflow Based on Deep Generative Models

Single-cell RNA-seq clustering: datasets, models, and algorithms

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

A Fusion Learning Model Based on Deep Learning for Single-Cell RNA Sequencing Data Clustering

Clustering single-cell RNA-seq data with a model-based deep learning approach

Non-negative low-rank representation based on dictionary learning for single-cell RNA-sequencing data analysis

A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study

An active learning approach for clustering single-cell RNA-seq data

Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

A Cell Marker-Based Clustering Strategy (cmcluster) for Precise Cell Type Identification of Scrna-Seq Data

scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data

Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data

Matrix prior for data transfer between single cell data types in latent Dirichlet allocation