Abstract:Single-nucleus joint ATAC- and RNA-sequencing (snMultiome) can be used to identify functionally divergent cell subpopulations based on their transcriptomic and epigenetic profiles within complex samples. Accurate cell type annotation is critical to successful snMultiome data analysis. Several computational methods have been developed for automatic annotation. Traditional cell type annotation methods initially cluster cells using unsupervised learning methods based on the gene expression profiles, then label the clusters using aggregated cluster-level expression profiles and marker genes. These methods rely heavily on the clustering results. As the purity of clusters cannot be guaranteed, false detection of cluster features may lead to incorrect annotations. Further, canonical cell surface markers may not always be suitable to be applied in single-nucleus RNA-seq studies because single-nucleus RNA-seq generally yields lower detected transcript numbers compared to typical single-cell RNA-seq. Moreover, cell type marker genes in the snRNA-seq data may differ from the ones obtained with scRNA-seq data, reflecting biological differences in the cytoplasmic and nuclear RNA pools. Lastly, the data obtained from malignant cells are best left out in establishing cell type reference data because they are too heterogeneous and patient-specific. Reference-based automated algorithms such as SingleR enable quick and unbiased classifications by leveraging a collection of built-in reference data sets for human (e.g. Human Primary Cell Atlas (microarray-based) and the combined Blueprint Epigenomics and Encode data set (RNA-seq-based)). Still, SingleR may return erroneous cell type classifications. Our dataset was generated using the 10x Genomics snMultiome platform to yield 296,557 nuclei from 82 frozen breast tumors, representing patients from diverse genetic ancestral background. Using these data, we sought to improve the accuracy of cell type annotation by SingleR. To achieve this, we first separated malignant and non-malignant cells based on DNA copy number aberrations (aneuploidy) through CopyKAT. For cells determined to be non-malignant, we built the custom reference from snRNA-seq data set, recently made available by The Human Breast Cell Atlas, and then applied singleR with a custom reference where each cell type is represented by single-cells of that type, allowing a well-founded estimate of the confidence with which a cell type call can be made. Using this approach, we successfully identified 11 distinct cell types for non-malignant cells, including fibroblast, adipocyte, pericyte, basal, luminal-secretory, luminal-HR, myeloid, mast, vascular, lymphatic, and T-cells, which can then be further subclassified. Furthermore, we interrogated each cluster using known canonical markers and transferred the cell type labels to snATAC-seq. This approach enabled us to link peaks to genes in each cell type. We believe this new approach that refines SingleR can greatly improve accuracy and minimize misclassification when annotating cell types in breast tumors using snMultiome data. Citation Format: Huaitian Liu, Alexandra Harris, Brittany Jenkins-Lord, Tiffany H. Dorsey, Francis Makokha, Shahin Sayed, Gretchen Gierach, Stefan Ambs. Cell type annotation using singleR with custom reference for single-nucleus multiome data derived from frozen human breast tumors [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl) nr LB240.

Abstract 3520: A scalable single cell RNA-seq pipeline leveraging machine learning and high-quality references for cell-type prediction

Abstract 878: Enhancing single-cell RNA sequencing analysis in cancer research: A machine learning framework based on LightGBM for automated cell type annotation

Abstract 2075: Highly customizable multi-sample single cell RNA-Seq pipeline on the CGC

Abstract LB240: Cell type annotation using singleR with custom reference for single-nucleus multiome data derived from frozen human breast tumors

Identification of cell types, states and programs by learning gene set representations

Sctab: Scaling Cross-Tissue Single-Cell Annotation Models

Abstract 863: Scratch: A highly modular pipeline for single-cell cancer research

GeoTyper: Automated Pipeline from Raw scRNA-Seq Data to Cell Type Identification

scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery

Abstract 316: A novel, high-throughput full-length scRNA-seq workflow for improved biomarker discovery

scRCA: a Siamese network-based pipeline for the annotation of cell types using imperfect single-cell RNA-seq reference data

Abstract 861: Improvements in variant calling sensitivity and specificity in single-cell DNA sequencing using deep learning

Abstract 3536: Paracell: A high throughput, deep learning-based pipeline for single-cell phenotypic profiling

Abstract 4959: Immunopipe: A comprehensive and flexible scRNA-seq and scTCR-seq data analysis pipeline

Searching Large-Scale Scrna-Seq Databases Via Unbiased Cell Embedding with Cell BLAST

Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single‐Cell Reference and Domain Adaptive Matching

Single cell RNA‐sequencing: A powerful yet still challenging technology to study cellular heterogeneity

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters

Abstract 4956: A fast and efficient bioinformatics analysis workflow for processing reads from single-cell multiomics assays captured on a microwell-based platform

A Strategy to Compare Single-Cell RNA Sequencing Data Sets Provides Phenotypic Insight into Cellular Heterogeneity Underlying Biological Similarities and Differences Between Samples