Abstract:Abstract The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.

Token-Level Self-Evolution Training for Sequence-to-Sequence Learning

Self-Evolution Learning for Discriminative Language Model Pretraining.

Self-Evolution Fine-Tuning for Policy Optimization

Better Simultaneous Translation with Monotonic Knowledge Distillation.

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Token-Level Fitting Issues of Seq2seq Models

SELF: Self-Evolution with Language Feedback

Evolving Subnetwork Training for Large Language Models

Sustainable Self-evolution Adversarial Training

MetaRL-SE: a few-shot speech enhancement method based on meta-reinforcement learning

Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models

SEP: Self-Enhanced Prompt Tuning for Visual-Language Model

SeiT++: Masked Token Modeling Improves Storage-efficient Training

SETA: Semantic-Aware Token Augmentation for Domain Generalization

Self-augmented sequentiality-aware encoding for aspect term extraction

Semiparametric Token-Sequence Co-Supervision

scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data

Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation