Abstract:Abstract The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.

Imbalance and Composition Correction Ensemble Learning Framework (ICCELF): A novel framework for automated scRNA-seq cell type annotation

scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data

Scwecta: A Weighted Ensemble Classification Framework for Cell Type Assignment Based on Single Cell Transcriptome

CASSIA allows for robust, automated cell annotation in single-cell RNA-sequencing data

Identification of cell types, states and programs by learning gene set representations

EnClaSC: a novel ensemble approach for accurate and robust cell-type classification of single-cell transcriptomes

A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data

scIAE: an integrative autoencoder-based ensemble classification framework for single-cell RNA-seq data

Cell-type composition analysis of scRNA-seq data with deep convolution neural network

scRCA: a Siamese network-based pipeline for the annotation of cell types using imperfect single-cell RNA-seq reference data

CLAIRE: Contrastive Learning-Based Batch Correction Framework for Better Balance Between Batch Mixing and Preservation of Cellular Heterogeneity.

SCINA: Semi-Supervised Analysis of Single Cells in silico

scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data

A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data

Distribution-Independent Cell Type Identification for Single-Cell RNA-seq Data

CIA: a Cluster Independent Annotation method to investigate cell identities in scRNA-seq data

Learning for single-cell assignment.

DCA-CLA: A Scrna-Seq Classification Framework Based on Deep Count Autoencoder

Scemail: Universal and Source-free Annotation Method for Scrna-Seq Data with Novel Cell-type Perception.

Scgat: A Cell-Type Annotation Framework for Single-Cell Transcriptomics Using Graph Attention Network and Meta Learning

Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data