Abstract:Single-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for individual cells. Because of the fast accumulation of datasets and the high dimensionality of the data, it has become challenging to explore and annotate single-cell transcriptomic profiles by hand. To overcome this challenge, automated classification methods are needed. Classical approaches rely on supervised training datasets. However, due to the difficulty of obtaining data annotated at single-cell resolution, we propose instead to take advantage of partial annotations. The partial label learning framework assumes that we can obtain a set of candidate labels containing the correct one for each data point, a simpler setting than requiring a fully supervised training dataset. We study and extend when needed state-of-the-art multi-class classification methods, such as SVM, kNN, prototype-based, logistic regression and ensemble methods, to the partial label learning framework. Moreover, we study the effect of incorporating the structure of the label set into the methods. We focus particularly on the hierarchical structure of the labels, as commonly observed in developmental processes. We show, on simulated and real datasets, that these extensions enable to learn from partially labeled data, and perform predictions with high accuracy, particularly with a nonlinear prototype-based method. We demonstrate that the performances of our methods trained with partially annotated data reach the same performance as fully supervised data. Finally, we study the level of uncertainty present in the partially annotated data, and derive some prescriptive results on the effect of this uncertainty on the accuracy of the partial label learning methods. Overall our findings show how hierarchical and non-hierarchical partial label learning strategies can help solve the problem of automated classification of single-cell transcriptomic profiles, interestingly these methods rely on a much less stringent type of annotated datasets compared to fully supervised learning methods. Recent years have witnessed an exponential increase in the amount of single-cell RNASeq data generated, particularly in studies of development. One of the major challenges is to identify individual cell types within the data. Expert knowledge is required to identify the relevant marker genes, tissue and timing that will enable the cell type identification. This information can be difficult to obtain and calls for automated cell type classification approaches. Classical classification techniques would solve this problem by training a classifier on a fully supervised dataset. However, this only pushes the problem further, as a dataset annotated at single-cell resolution is still needed for training. Here we propose instead to take advantage of the partial label learning framework which let us train our classifiers on a set of candidate labels per transcriptomic profile. This approach overcomes the need for a training dataset annotated at single-cell resolution. We show that we obtain classification accuracy similar to the fully supervised case. We explore the effect of varying the amount of partially labeled data and of considering the hierarchical structure of the label set (derived from the developmental processes) in the models on simulated and real biological datasets.

The impacts of active and self-supervised learning on efficient annotation of single-cell expression data

Automatic Cell Type Annotation Using Marker Genes for Single-Cell RNA Sequencing Data

A self-training interpretable cell type annotation framework using specific marker gene

Scgat: A Cell-Type Annotation Framework for Single-Cell Transcriptomics Using Graph Attention Network and Meta Learning

Automated cell profiling in imaging flow cytometry with annotation-efficient learning

AL-Annotator: an Active Learning-based Cervical Cell Annotation System

Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation

Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data

Learning Cell Annotation under Multiple Reference Datasets by Multisource Domain Adaptation

A comparison of automatic cell identification methods for single-cell RNA sequencing data

Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data

Partial label learning for automated classification of single-cell transcriptomic profiles

Objectively Evaluating the Reliability of Cell Type Annotation Using LLM-Based Strategies

An active learning approach for clustering single-cell RNA-seq data

Point-supervised Single-cell Segmentation via Collaborative Knowledge Sharing

Efficient end-to-end learning for cell segmentation with machine generated weak annotations

Realistic Cell Type Annotation and Discovery for Single-cell RNA-seq Data

scPretrain: multi-task self-supervised learning for cell-type classification

A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data

ELeFHAnt: A supervised machine learning approach for label harmonization and annotation of single cell RNA-seq data