Abstract:Single-nucleus joint ATAC- and RNA-sequencing (snMultiome) can be used to identify functionally divergent cell subpopulations based on their transcriptomic and epigenetic profiles within complex samples. Accurate cell type annotation is critical to successful snMultiome data analysis. Several computational methods have been developed for automatic annotation. Traditional cell type annotation methods initially cluster cells using unsupervised learning methods based on the gene expression profiles, then label the clusters using aggregated cluster-level expression profiles and marker genes. These methods rely heavily on the clustering results. As the purity of clusters cannot be guaranteed, false detection of cluster features may lead to incorrect annotations. Further, canonical cell surface markers may not always be suitable to be applied in single-nucleus RNA-seq studies because single-nucleus RNA-seq generally yields lower detected transcript numbers compared to typical single-cell RNA-seq. Moreover, cell type marker genes in the snRNA-seq data may differ from the ones obtained with scRNA-seq data, reflecting biological differences in the cytoplasmic and nuclear RNA pools. Lastly, the data obtained from malignant cells are best left out in establishing cell type reference data because they are too heterogeneous and patient-specific. Reference-based automated algorithms such as SingleR enable quick and unbiased classifications by leveraging a collection of built-in reference data sets for human (e.g. Human Primary Cell Atlas (microarray-based) and the combined Blueprint Epigenomics and Encode data set (RNA-seq-based)). Still, SingleR may return erroneous cell type classifications. Our dataset was generated using the 10x Genomics snMultiome platform to yield 296,557 nuclei from 82 frozen breast tumors, representing patients from diverse genetic ancestral background. Using these data, we sought to improve the accuracy of cell type annotation by SingleR. To achieve this, we first separated malignant and non-malignant cells based on DNA copy number aberrations (aneuploidy) through CopyKAT. For cells determined to be non-malignant, we built the custom reference from snRNA-seq data set, recently made available by The Human Breast Cell Atlas, and then applied singleR with a custom reference where each cell type is represented by single-cells of that type, allowing a well-founded estimate of the confidence with which a cell type call can be made. Using this approach, we successfully identified 11 distinct cell types for non-malignant cells, including fibroblast, adipocyte, pericyte, basal, luminal-secretory, luminal-HR, myeloid, mast, vascular, lymphatic, and T-cells, which can then be further subclassified. Furthermore, we interrogated each cluster using known canonical markers and transferred the cell type labels to snATAC-seq. This approach enabled us to link peaks to genes in each cell type. We believe this new approach that refines SingleR can greatly improve accuracy and minimize misclassification when annotating cell types in breast tumors using snMultiome data. Citation Format: Huaitian Liu, Alexandra Harris, Brittany Jenkins-Lord, Tiffany H. Dorsey, Francis Makokha, Shahin Sayed, Gretchen Gierach, Stefan Ambs. Cell type annotation using singleR with custom reference for single-nucleus multiome data derived from frozen human breast tumors [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl) nr LB240.

Cell type matching in single-cell RNA-sequencing data using FR-Match

FR-Match: robust matching of cell type clusters from single cell RNA sequencing data using the Friedman–Rafsky non-parametric test

Alignment of single-cell RNA-seq samples without overcorrection using kernel density matching

Reference-based cell type matching of spatial transcriptomics data

ClusterMatch aligns single-cell RNA-sequencing data at the multi-scale cluster level via stable matching

Automated Cell Type Annotation with Reference Cluster Mapping

Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single‐Cell Reference and Domain Adaptive Matching

Robust single-cell matching and multimodal analysis using shared and distinct features

A Strategy to Compare Single-Cell RNA Sequencing Data Sets Provides Phenotypic Insight into Cellular Heterogeneity Underlying Biological Similarities and Differences Between Samples

Abstract LB240: Cell type annotation using singleR with custom reference for single-nucleus multiome data derived from frozen human breast tumors

A Cell Marker-Based Clustering Strategy (cmcluster) for Precise Cell Type Identification of Scrna-Seq Data

Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data

Abstract 3520: A scalable single cell RNA-seq pipeline leveraging machine learning and high-quality references for cell-type prediction

Cell Type Differentiation Using Network Clustering Algorithms

Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

Cell type matching across species using protein embeddings and transfer learning

scClassify: sample size estimation and multiscale classification of cells using single and multiple reference

Integration for single-cell RNA sequencing data based on the shared cell type assignment

CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq

MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics