Abstract:Single-cell RNA-sequencing (scRNA-seq) is a powerful technique that provides high-resolution expression profiling of individual cells. It significantly advances our understanding of cellular diversity and function. Despite its potential, the analysis of scRNA-seq data poses considerable challenges related to multicollinearity, data imbalance, and batch effect. One of the pivotal tasks in single-cell data analysis is cell type annotation, which classifies cells into discrete types based on their gene expression profiles. In this work, we propose a novel modeling formalism for cell type annotation with a supervised contrastive learning method, named SCLSC (Supervised Contrastive Learning for Single Cell). Different from the previous usage of contrastive learning in single cell data analysis, we employed the contrastive learning for instance-type pairs instead of instance-instance pairs. More specifically, in the cell type annotation task, the contrastive learning is applied to learn cell and cell type representation that render cells of the same type to be clustered in the new embedding space. Through this approach, the knowledge derived from annotated cells is transferred to the feature representation for scRNA-seq data. The whole training process becomes more efficient when conducting contrastive learning for cell and their types. Our experiment results demonstrate that the proposed SCLSC method consistently achieves superior accuracy in predicting cell types compared to five state-of-the-art methods. SCLSC also performs well in identifying cell types in different batch groups. The simplicity of our method allows for scalability, making it suitable for analyzing datasets with a large number of cells. In a real-world application of SCLSC to monitor the dynamics of immune cell subpopulations over time, SCLSC demonstrates a capability to discriminate cell subtypes of CD19+ B cells that were not present in the training dataset.

Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning

scReader: Prompting Large Language Models to Interpret scRNA-seq Data

Contrastive Learning for Robust Cell Annotation and Representation from Single-Cell Transcriptomics

scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis

scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation

Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data

How do Large Language Models understand Genes and Cells

Predicting cell types with supervised contrastive learning on cells and their types

CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning

ScCCL: Single-Cell Data Clustering Based on Self-Supervised Contrastive Learning

Scellseg: a Style-Aware Cell Instance Segmentation Tool with Pre-Training and Contrastive Fine-Tuning

Integrating large-scale single-cell RNA sequencing in central nervous system disease using self-supervised contrastive learning

Contrastive learning enables rapid mapping to multimodal single-cell atlas of multimillion scale

The Development of AI Foundation Models for Single-Cell Transcriptomics

Graph Contrastive Learning as a Versatile Foundation for Advanced scRNA-seq Data Analysis

Scaling Dense Representations for Single Cell with Transcriptome-Scale Context

LangCell: Language-Cell Pre-training for Cell Identity Understanding

Network Embedding-Based Representation Learning for Single Cell RNA-seq Data

Parameter-Efficient Fine-Tuning Enhances Adaptation of Single Cell Large Language Model for Cell Type Identification