Abstract:Abstract Motivation The rapid development of single-cell RNA sequencing (scRNA-seq) makes it possible to study the heterogeneity of individual cell characteristics. Cell clustering is a vital procedure in scRNA-seq analysis, providing insight into complex biological phenomena. However, the noisy, high-dimensional and large-scale nature of scRNA-seq data introduces challenges in clustering analysis. Up to now, many deep learning-based methods have emerged to learn underlying feature representations while clustering. However, these methods are inefficient when it comes to rare cell type identification and barely able to fully utilize gene dependencies or cell similarity integrally. As a result, they cannot detect a clear cell type structure which is required for clustering accuracy as well as downstream analysis. Results Here, we propose a novel scRNA-seq clustering algorithm called scNAME which incorporates a mask estimation task for gene pertinence mining and a neighborhood contrastive learning framework for cell intrinsic structure exploitation. The learned pattern through mask estimation helps reveal uncorrupted data structure and denoise the original single-cell data. In addition, the randomly created augmented data introduced in contrastive learning not only helps improve robustness of clustering, but also increases sample size in each cluster for better data capacity. Beyond this, we also introduce a neighborhood contrastive paradigm with an offline memory bank, global in scope, which can inspire discriminative feature representation and achieve intra-cluster compactness, yet inter-cluster separation. The combination of mask estimation task, neighborhood contrastive learning and global memory bank designed in scNAME is conductive to rare cell type detection. The experimental results of both simulations and real data confirm that our method is accurate, robust and scalable. We also implement biological analysis, including marker gene identification, gene ontology and pathway enrichment analysis, to validate the biological significance of our method. To the best of our knowledge, we are among the first to introduce a gene relationship exploration strategy, as well as a global cellular similarity repository, in the single-cell field. Availability and implementation An implementation of scNAME is available from https://github.com/aster-ww/scNAME. Supplementary information Supplementary data are available at Bioinformatics online.

scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

ProtHyena: A fast and efficient foundation protein language model at single amino acid Resolution

An End-to-End Deep Hybrid Autoencoder Based Method for Single-Cell RNA-Seq Data Analysis

Sctab: Scaling Cross-Tissue Single-Cell Annotation Models

CELLama: Foundation Model for Single Cell and Spatial Transcriptomics by Cell Embedding Leveraging Language Model Abilities

scLong: A Billion-Parameter Foundation Model for Capturing Long-Range Gene Context in Single-Cell Transcriptomics

SCINA: Semi-Supervised Analysis of Single Cells in silico

scRCA: a Siamese network-based pipeline for the annotation of cell types using imperfect single-cell RNA-seq reference data

scGAA: a general gated axial-attention model for accurate cell-type annotation of single-cell RNA-seq data

scTCA: a hybrid Transformer-CNN architecture for imputation and denoising of scDNA-seq data

scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis

Large-scale foundation model on single-cell transcriptomics

scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation

scGraphformer: unveiling cellular heterogeneity and interactions in scRNA-seq data using a scalable graph transformer network

scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI

Enhanced recovery of single-cell RNA-sequencing reads for missing gene expression data

Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder

scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing

scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data

scGPT: toward building a foundation model for single-cell multi-omics using generative AI